Source: data_layer/docs/SCHEMA_SDL_MIGRATION_ANALYSIS.md
GraphQL SDL Migration Analysis
Current State Assessment
What You Have (β Strong Foundation)
Your current schema system is well-architected and already follows many best practices:
database/schemas/
βββ domain/v1/ # β
Single Source of Truth (JSON Schema)
β βββ classification/
β βββ forms/
β βββ match-types/
β βββ types/
β
βββ generated/adapters/ # β
Multi-target code generation
β βββ python/v1/ # Pydantic models
β βββ typescript/v1/ # TS interfaces + Zod
β βββ drizzle/v1/ # Drizzle ORM
β βββ neo4j/v1/ # Graph stubs
β βββ postgresql/v1/ # SQL DDL stubs
β βββ redshift/v1/ # Redshift DDL stubs
β
βββ infrastructure/ # β
Infrastructure layer separation
βββ postgresql/ # Raw SQL DDL
βββ prisma/ # Prisma schemasCurrent Architecture Strengths
-
β Separation of Concerns
- Domain schemas (business logic) β Infrastructure (database-specific)
- Clean boundary between canonical definitions and generated code
-
β Multi-Target Generation
- Already generating Python, TypeScript, Drizzle from JSON Schema
./scripts/regenerate_adapters.shorchestrates everything- Version management (v1, v2, etc.) built-in
-
β Clear Import Hierarchy
# Source of truth database/schemas/domain/v1/tier.schema.json # Generated adapter (use this) from database.schemas.generated.adapters.python.v1.tier import TierClassification # Extended with business logic (optional) from app.models.tier_models import EnhancedTierClassification -
β Database-Specific Infrastructure
- PostgreSQL DDL in
infrastructure/postgresql/ - Prisma schema in
infrastructure/prisma/ - Already separate from domain layer
- PostgreSQL DDL in
Current Architecture Gaps
1. β No First-Class Relationships
JSON Schema struggles with relationships:
// Current: tier.schema.json
{
"properties": {
"tier_name": {"type": "string"},
"description": {"type": "string"}
}
}
// Current: league.schema.json
{
"properties": {
"name": {"type": "string"},
"tier": {"type": "string"} // β Just a string, no relationship metadata
}
}What's missing:
- Cardinality (1:1, 1:many, many:many)
- Relationship direction and names
- Relationship properties (e.g.,
since_dateon a partnership) - Bidirectional navigation
2. β Limited Graph Capabilities
Your Neo4j adapter generates stubs, not real Cypher:
# database/schemas/generated/adapters/neo4j/v1/tier.py (current)
"""
Generated Neo4j adapter for tier.schema.json
TODO: Implement actual Cypher generation
"""What you want:
// Auto-generated constraints and indexes
CREATE CONSTRAINT tier_name IF NOT EXISTS FOR (t:Tier) REQUIRE t.tier_name IS UNIQUE;
CREATE INDEX tier_description IF NOT EXISTS FOR (t:Tier) ON (t.description);
// Auto-generated relationship queries
MATCH (l:League)-[:HAS_TIER]->(t:Tier)
WHERE t.tier_name = $tier_name
RETURN l;3. β Manual SQL/Prisma Synchronization
Currently, you have two separate systems:
database/schemas/domain/v1/β generates Pydantic/TS/Drizzledatabase/schemas/infrastructure/postgresql/β hand-written SQL DDLdatabase/schemas/infrastructure/prisma/schema.prismaβ hand-written Prisma
The problem: Changes to domain schemas require manual updates to SQL/Prisma.
4. β No Cross-Language Consistency for Relationships
When you define a relationship, you want it consistent across:
- Postgres foreign keys
- Neo4j edges
- Prisma relations
- Pydantic validators
- TypeScript types
Current state: Each system has different relationship definitions.
GraphQL SDL/LinkML Proposal
What It Solves
GraphQL SDL (with directives) addresses all four gaps:
# database/schemas/domain/v1/sdl/league_tier.graphql
type Tier @id(field: "tier_name") @unique(fields: ["tier_name"]) {
tier_name: String! @id
description: String
priority_score: Int
# First-class relationship definition
leagues: [League!]! @relationship(type: "HAS_TIER", direction: IN)
}
type League @id(field: "league_id") {
league_id: ID! @id
name: String!
sport: String!
# Relationship with cardinality and direction
tier: Tier! @relationship(type: "HAS_TIER", direction: OUT)
teams: [Team!]! @relationship(type: "HAS_TEAM", direction: OUT)
}
type Team @id(field: "team_id") {
team_id: ID! @id
name: String!
league: League! @relationship(type: "HAS_TEAM", direction: IN)
}What Gets Generated
From this single SDL file, generate:
1. PostgreSQL DDL
CREATE TABLE tiers (
tier_name VARCHAR(100) PRIMARY KEY,
description TEXT,
priority_score INT
);
CREATE TABLE leagues (
league_id UUID PRIMARY KEY,
name VARCHAR(255) NOT NULL,
sport VARCHAR(100) NOT NULL,
tier_name VARCHAR(100) NOT NULL REFERENCES tiers(tier_name)
);
CREATE TABLE teams (
team_id UUID PRIMARY KEY,
name VARCHAR(255) NOT NULL,
league_id UUID NOT NULL REFERENCES leagues(league_id)
);2. Neo4j Cypher
// Constraints
CREATE CONSTRAINT tier_unique IF NOT EXISTS FOR (t:Tier) REQUIRE t.tier_name IS UNIQUE;
CREATE CONSTRAINT league_unique IF NOT EXISTS FOR (l:League) REQUIRE l.league_id IS UNIQUE;
// Indexes
CREATE INDEX tier_priority IF NOT EXISTS FOR (t:Tier) ON (t.priority_score);
// Sample relationship creation (in application code)
MATCH (l:League {league_id: $league_id})
MATCH (t:Tier {tier_name: $tier_name})
MERGE (l)-[:HAS_TIER]->(t);3. Prisma Schema
model Tier {
tier_name String @id
description String?
priority_score Int?
leagues League[] @relation("HAS_TIER")
}
model League {
league_id String @id @default(uuid())
name String
sport String
tier_name String
tier Tier @relation("HAS_TIER", fields: [tier_name], references: [tier_name])
teams Team[] @relation("HAS_TEAM")
}
model Team {
team_id String @id @default(uuid())
name String
league_id String
league League @relation("HAS_TEAM", fields: [league_id], references: [league_id])
}4. Pydantic Models (with relationships)
from pydantic import BaseModel, Field
from typing import List, Optional
class Tier(BaseModel):
tier_name: str = Field(..., description="Primary identifier")
description: Optional[str] = None
priority_score: Optional[int] = None
# Relationship metadata
class Config:
relationships = {
"leagues": {"type": "League", "cardinality": "many", "direction": "in"}
}
class League(BaseModel):
league_id: str = Field(..., description="Primary identifier")
name: str
sport: str
tier_name: str
class Config:
relationships = {
"tier": {"type": "Tier", "cardinality": "one", "direction": "out"},
"teams": {"type": "Team", "cardinality": "many", "direction": "out"}
}5. TypeScript Types + Zod
import { z } from 'zod';
export const TierSchema = z.object({
tier_name: z.string(),
description: z.string().optional(),
priority_score: z.number().optional(),
});
export type Tier = z.infer<typeof TierSchema>;
export const LeagueSchema = z.object({
league_id: z.string().uuid(),
name: z.string(),
sport: z.string(),
tier_name: z.string(),
});
export type League = z.infer<typeof LeagueSchema>;
// Relationship metadata
export const LeagueRelationships = {
tier: { type: 'Tier', cardinality: 'one' },
teams: { type: 'Team', cardinality: 'many' },
} as const;Migration Path: Hybrid Approach
Recommended Strategy: Incremental Hybrid
Don't throw away your current system. Layer SDL on top of it:
database/schemas/
βββ domain/
β βββ v1/ # KEEP: JSON Schema for basic types
β β βββ types/ # Primitives, enums, simple objects
β β βββ forms/ # Form definitions (JSON Schema works great here)
β β
β βββ sdl/ # NEW: GraphQL SDL for entities + relationships
β βββ v1/
β βββ league_tier.graphql
β βββ contracts.graphql
β βββ combat_sports.graphql
β βββ team_sports.graphql
β
βββ generated/adapters/ # ENHANCED: Add relationship metadata
β βββ python/v1/
β βββ typescript/v1/
β βββ ...
β
βββ infrastructure/ # AUTO-GENERATED: From SDL
βββ postgresql/
β βββ generated_from_sdl/ # Auto-generated DDL
βββ prisma/
β βββ generated.schema.prisma # Auto-generated from SDL
βββ neo4j/
βββ generated_constraints.cypher # Auto-generated from SDLMigration Phases
Phase 1: Proof of Concept (1-2 weeks)
Goal: Validate SDL codegen for one domain (league tiers)
Tasks:
- Create
database/schemas/domain/sdl/v1/league_tier.graphql - Write codegen script
database/setup/generate_from_sdl.py - Generate:
- PostgreSQL DDL
- Prisma schema
- Neo4j constraints
- Enhanced Pydantic models
- Test: Can you create League β Tier relationships in all three systems?
Success Criteria:
- Single SDL file generates working Postgres tables, Prisma models, and Neo4j constraints
- Relationship consistency verified across all three
Phase 2: Core Entities (2-3 weeks)
Goal: Migrate core business entities to SDL
Entities to migrate:
- Leagues + Tiers
- Contracts + Terms
- Teams + Players
- Partnerships + Contacts
Keep as JSON Schema:
- Form definitions (onboarding forms, questionnaires)
- Simple enums and primitives
- Validation-only schemas
Phase 3: Sport-Specific Models (3-4 weeks)
Goal: Add sport-specific entities (combat, racing, team sports)
Approach:
# Shared base
interface SportEvent {
event_id: ID!
date: DateTime!
venue: String
}
# Sport-specific implementations
type CombatBout implements SportEvent {
event_id: ID!
date: DateTime!
venue: String
fighter_a: Fighter! @relationship(type: "FIGHTS_IN", direction: IN)
fighter_b: Fighter! @relationship(type: "FIGHTS_IN", direction: IN)
weight_class: String!
method_of_victory: String
}
type RaceEvent implements SportEvent {
event_id: ID!
date: DateTime!
venue: String
track_length: Float!
weather_conditions: String
participants: [Driver!]! @relationship(type: "COMPETES_IN", direction: IN)
}Phase 4: Full Automation (Ongoing)
Goal: One command regenerates everything
# Regenerate all adapters from SDL + JSON Schema
./scripts/regenerate_all_schemas.sh
# Internally calls:
python database/setup/generate_from_sdl.py # SDL β SQL/Prisma/Neo4j/Pydantic
python database/setup/generate_adapters.py # JSON Schema β Pydantic/TS/ZodImplementation: SDL Codegen Script
Minimal Codegen Architecture
# database/setup/generate_from_sdl.py
from pathlib import Path
from typing import List
import graphql
SDL_ROOT = Path("database/schemas/domain/sdl/v1")
INFRA_ROOT = Path("database/schemas/infrastructure")
def parse_sdl(sdl_path: Path) -> graphql.DocumentNode:
"""Parse GraphQL SDL file."""
return graphql.parse(sdl_path.read_text())
def generate_postgresql(schema: graphql.DocumentNode) -> str:
"""Generate PostgreSQL DDL from SDL."""
ddl = []
for definition in schema.definitions:
if isinstance(definition, graphql.ObjectTypeDefinitionNode):
ddl.append(generate_table(definition))
return "\n\n".join(ddl)
def generate_table(type_def: graphql.ObjectTypeDefinitionNode) -> str:
"""Generate CREATE TABLE statement."""
table_name = type_def.name.value.lower() + "s" # Simple pluralization
columns = []
for field in type_def.fields:
# Check for @id directive
is_id = any(d.name.value == "id" for d in (field.directives or []))
column = generate_column(field, is_id)
columns.append(column)
return f"CREATE TABLE {table_name} (\n " + ",\n ".join(columns) + "\n);"
def generate_column(field: graphql.FieldDefinitionNode, is_id: bool) -> str:
"""Generate column definition."""
field_name = field.name.value
field_type = graphql_to_sql_type(field.type)
constraints = []
if is_id:
constraints.append("PRIMARY KEY")
if isinstance(field.type, graphql.NonNullTypeNode):
constraints.append("NOT NULL")
return f"{field_name} {field_type} " + " ".join(constraints)
def graphql_to_sql_type(gql_type) -> str:
"""Map GraphQL type to SQL type."""
# Unwrap NonNull and List types
if isinstance(gql_type, graphql.NonNullTypeNode):
gql_type = gql_type.type
if isinstance(gql_type, graphql.ListTypeNode):
return "JSONB" # Store arrays as JSONB
type_name = gql_type.name.value
mapping = {
"ID": "UUID",
"String": "VARCHAR(255)",
"Int": "INTEGER",
"Float": "DOUBLE PRECISION",
"Boolean": "BOOLEAN",
"DateTime": "TIMESTAMPTZ",
}
return mapping.get(type_name, "TEXT")
# Similar generators for Prisma, Neo4j, Pydantic...Generated Outputs
From SDL:
infrastructure/
βββ postgresql/
β βββ generated/
β βββ league_tier.sql # β
Auto-generated
β
βββ prisma/
β βββ generated/
β βββ league_tier.prisma # β
Auto-generated
β
βββ neo4j/
βββ generated/
βββ league_tier_constraints.cypher # β
Auto-generatedDecision Framework
When to Use SDL vs JSON Schema
| Use Case | Use SDL | Use JSON Schema |
|---|---|---|
| Entities with relationships (Leagues, Teams, Contracts) | β SDL | β |
| Graph queries (Neo4j relationships) | β SDL | β |
| Cross-database consistency (Postgres + Neo4j + Prisma) | β SDL | β οΈ Possible but harder |
| Form definitions (onboarding forms, questionnaires) | β | β JSON Schema |
| Simple types/enums (tier names, statuses) | β οΈ Either | β JSON Schema (easier) |
| API request/response validation | β οΈ Either | β JSON Schema |
| AI-guided transformations | β οΈ Either | β JSON Schema (better tooling) |
Hybrid Strategy
Use SDL for:
- Entities (
League,Team,Player,Contract) - Relationships (
League -[HAS_TIER]-> Tier) - Multi-database consistency (Postgres + Neo4j)
Keep JSON Schema for:
- Forms and questionnaires
- API contracts
- Validation rules
- Simple types (enums, primitives)
Effort Estimate
Phase 1: Proof of Concept (1-2 weeks)
- Effort: 20-30 hours
- Deliverable: Working codegen for one domain (league tiers)
- Risk: Low (doesn't break existing system)
Phase 2: Core Entities (2-3 weeks)
- Effort: 40-60 hours
- Deliverable: Leagues, Contracts, Teams, Partnerships in SDL
- Risk: Medium (parallel to existing system, migration needed)
Phase 3: Sport-Specific Models (3-4 weeks)
- Effort: 60-80 hours
- Deliverable: Combat, Racing, Team sports in SDL
- Risk: Medium (complex domain modeling)
Total: 8-12 weeks (120-170 hours)
Recommendation
β Yes, adopt SDL β but incrementally
Why:
- You already have the foundation: Multi-target codegen, version management, separation of concerns
- SDL fills critical gaps: First-class relationships, Neo4j integration, cross-database consistency
- Low risk: Layer SDL on top of existing JSON Schema (hybrid approach)
- High ROI: One SDL file β Postgres + Prisma + Neo4j + Pydantic + TypeScript
Start Small
- This week: Create
league_tier.graphqland basic codegen script - Next week: Generate Postgres DDL + Prisma schema + Neo4j constraints
- Validate: Can you create League β Tier relationships in all three databases?
- Iterate: Add more entities once the pattern works
Key Principle
Keep what works (JSON Schema for forms, validation), add what's missing (SDL for entities, relationships).
Next Steps
- Review this analysis β Does the hybrid approach make sense for your use case?
- Prototype SDL codegen β Create
generate_from_sdl.pyfor Postgres + Prisma + Neo4j - Pick one domain β Start with Leagues + Tiers (well-understood, critical to your business)
- Test end-to-end β Can you create relationships in all three systems from one SDL file?
- Decide on migration β If POC succeeds, plan phased migration of core entities
Questions to answer before proceeding:
- How critical is Neo4j graph querying to your roadmap? (High = prioritize SDL)
- How often do your entity relationships change? (Often = SDL pays off quickly)
- Do you have bandwidth for 1-2 weeks of POC work? (If yes, start this week)
- Are you comfortable with hybrid SDL + JSON Schema? (Or prefer single system?)
Let me know your thoughts and I can help with the POC implementation!