Architecture
GraphQL SDL Migration Analysis

Source: data_layer/docs/SCHEMA_SDL_MIGRATION_ANALYSIS.md

GraphQL SDL Migration Analysis

Current State Assessment

What You Have (βœ… Strong Foundation)

Your current schema system is well-architected and already follows many best practices:

database/schemas/
β”œβ”€β”€ domain/v1/              # βœ… Single Source of Truth (JSON Schema)
β”‚   β”œβ”€β”€ classification/
β”‚   β”œβ”€β”€ forms/
β”‚   β”œβ”€β”€ match-types/
β”‚   └── types/
β”‚
β”œβ”€β”€ generated/adapters/     # βœ… Multi-target code generation
β”‚   β”œβ”€β”€ python/v1/          # Pydantic models
β”‚   β”œβ”€β”€ typescript/v1/      # TS interfaces + Zod
β”‚   β”œβ”€β”€ drizzle/v1/         # Drizzle ORM
β”‚   β”œβ”€β”€ neo4j/v1/           # Graph stubs
β”‚   β”œβ”€β”€ postgresql/v1/      # SQL DDL stubs
β”‚   └── redshift/v1/        # Redshift DDL stubs
β”‚
└── infrastructure/         # βœ… Infrastructure layer separation
    β”œβ”€β”€ postgresql/         # Raw SQL DDL
    └── prisma/            # Prisma schemas

Current Architecture Strengths

  1. βœ… Separation of Concerns

    • Domain schemas (business logic) β‰  Infrastructure (database-specific)
    • Clean boundary between canonical definitions and generated code
  2. βœ… Multi-Target Generation

    • Already generating Python, TypeScript, Drizzle from JSON Schema
    • ./scripts/regenerate_adapters.sh orchestrates everything
    • Version management (v1, v2, etc.) built-in
  3. βœ… Clear Import Hierarchy

    # Source of truth
    database/schemas/domain/v1/tier.schema.json
     
    # Generated adapter (use this)
    from database.schemas.generated.adapters.python.v1.tier import TierClassification
     
    # Extended with business logic (optional)
    from app.models.tier_models import EnhancedTierClassification
  4. βœ… Database-Specific Infrastructure

    • PostgreSQL DDL in infrastructure/postgresql/
    • Prisma schema in infrastructure/prisma/
    • Already separate from domain layer

Current Architecture Gaps

1. ❌ No First-Class Relationships

JSON Schema struggles with relationships:

// Current: tier.schema.json
{
  "properties": {
    "tier_name": {"type": "string"},
    "description": {"type": "string"}
  }
}
 
// Current: league.schema.json
{
  "properties": {
    "name": {"type": "string"},
    "tier": {"type": "string"}  // ❌ Just a string, no relationship metadata
  }
}

What's missing:

  • Cardinality (1:1, 1:many, many:many)
  • Relationship direction and names
  • Relationship properties (e.g., since_date on a partnership)
  • Bidirectional navigation

2. ❌ Limited Graph Capabilities

Your Neo4j adapter generates stubs, not real Cypher:

# database/schemas/generated/adapters/neo4j/v1/tier.py (current)
"""
Generated Neo4j adapter for tier.schema.json
TODO: Implement actual Cypher generation
"""

What you want:

// Auto-generated constraints and indexes
CREATE CONSTRAINT tier_name IF NOT EXISTS FOR (t:Tier) REQUIRE t.tier_name IS UNIQUE;
CREATE INDEX tier_description IF NOT EXISTS FOR (t:Tier) ON (t.description);
 
// Auto-generated relationship queries
MATCH (l:League)-[:HAS_TIER]->(t:Tier)
WHERE t.tier_name = $tier_name
RETURN l;

3. ❌ Manual SQL/Prisma Synchronization

Currently, you have two separate systems:

  • database/schemas/domain/v1/ β†’ generates Pydantic/TS/Drizzle
  • database/schemas/infrastructure/postgresql/ β†’ hand-written SQL DDL
  • database/schemas/infrastructure/prisma/schema.prisma β†’ hand-written Prisma

The problem: Changes to domain schemas require manual updates to SQL/Prisma.

4. ❌ No Cross-Language Consistency for Relationships

When you define a relationship, you want it consistent across:

  • Postgres foreign keys
  • Neo4j edges
  • Prisma relations
  • Pydantic validators
  • TypeScript types

Current state: Each system has different relationship definitions.


GraphQL SDL/LinkML Proposal

What It Solves

GraphQL SDL (with directives) addresses all four gaps:

# database/schemas/domain/v1/sdl/league_tier.graphql
 
type Tier @id(field: "tier_name") @unique(fields: ["tier_name"]) {
  tier_name: String! @id
  description: String
  priority_score: Int
 
  # First-class relationship definition
  leagues: [League!]! @relationship(type: "HAS_TIER", direction: IN)
}
 
type League @id(field: "league_id") {
  league_id: ID! @id
  name: String!
  sport: String!
 
  # Relationship with cardinality and direction
  tier: Tier! @relationship(type: "HAS_TIER", direction: OUT)
  teams: [Team!]! @relationship(type: "HAS_TEAM", direction: OUT)
}
 
type Team @id(field: "team_id") {
  team_id: ID! @id
  name: String!
  league: League! @relationship(type: "HAS_TEAM", direction: IN)
}

What Gets Generated

From this single SDL file, generate:

1. PostgreSQL DDL

CREATE TABLE tiers (
  tier_name VARCHAR(100) PRIMARY KEY,
  description TEXT,
  priority_score INT
);
 
CREATE TABLE leagues (
  league_id UUID PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  sport VARCHAR(100) NOT NULL,
  tier_name VARCHAR(100) NOT NULL REFERENCES tiers(tier_name)
);
 
CREATE TABLE teams (
  team_id UUID PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  league_id UUID NOT NULL REFERENCES leagues(league_id)
);

2. Neo4j Cypher

// Constraints
CREATE CONSTRAINT tier_unique IF NOT EXISTS FOR (t:Tier) REQUIRE t.tier_name IS UNIQUE;
CREATE CONSTRAINT league_unique IF NOT EXISTS FOR (l:League) REQUIRE l.league_id IS UNIQUE;
 
// Indexes
CREATE INDEX tier_priority IF NOT EXISTS FOR (t:Tier) ON (t.priority_score);
 
// Sample relationship creation (in application code)
MATCH (l:League {league_id: $league_id})
MATCH (t:Tier {tier_name: $tier_name})
MERGE (l)-[:HAS_TIER]->(t);

3. Prisma Schema

model Tier {
  tier_name      String   @id
  description    String?
  priority_score Int?
  leagues        League[] @relation("HAS_TIER")
}
 
model League {
  league_id String @id @default(uuid())
  name      String
  sport     String
  tier_name String
  tier      Tier   @relation("HAS_TIER", fields: [tier_name], references: [tier_name])
  teams     Team[] @relation("HAS_TEAM")
}
 
model Team {
  team_id   String @id @default(uuid())
  name      String
  league_id String
  league    League @relation("HAS_TEAM", fields: [league_id], references: [league_id])
}

4. Pydantic Models (with relationships)

from pydantic import BaseModel, Field
from typing import List, Optional
 
class Tier(BaseModel):
    tier_name: str = Field(..., description="Primary identifier")
    description: Optional[str] = None
    priority_score: Optional[int] = None
 
    # Relationship metadata
    class Config:
        relationships = {
            "leagues": {"type": "League", "cardinality": "many", "direction": "in"}
        }
 
class League(BaseModel):
    league_id: str = Field(..., description="Primary identifier")
    name: str
    sport: str
    tier_name: str
 
    class Config:
        relationships = {
            "tier": {"type": "Tier", "cardinality": "one", "direction": "out"},
            "teams": {"type": "Team", "cardinality": "many", "direction": "out"}
        }

5. TypeScript Types + Zod

import { z } from 'zod';
 
export const TierSchema = z.object({
  tier_name: z.string(),
  description: z.string().optional(),
  priority_score: z.number().optional(),
});
 
export type Tier = z.infer<typeof TierSchema>;
 
export const LeagueSchema = z.object({
  league_id: z.string().uuid(),
  name: z.string(),
  sport: z.string(),
  tier_name: z.string(),
});
 
export type League = z.infer<typeof LeagueSchema>;
 
// Relationship metadata
export const LeagueRelationships = {
  tier: { type: 'Tier', cardinality: 'one' },
  teams: { type: 'Team', cardinality: 'many' },
} as const;

Migration Path: Hybrid Approach

Recommended Strategy: Incremental Hybrid

Don't throw away your current system. Layer SDL on top of it:

database/schemas/
β”œβ”€β”€ domain/
β”‚   β”œβ”€β”€ v1/                    # KEEP: JSON Schema for basic types
β”‚   β”‚   β”œβ”€β”€ types/             # Primitives, enums, simple objects
β”‚   β”‚   └── forms/             # Form definitions (JSON Schema works great here)
β”‚   β”‚
β”‚   └── sdl/                   # NEW: GraphQL SDL for entities + relationships
β”‚       └── v1/
β”‚           β”œβ”€β”€ league_tier.graphql
β”‚           β”œβ”€β”€ contracts.graphql
β”‚           β”œβ”€β”€ combat_sports.graphql
β”‚           └── team_sports.graphql
β”‚
β”œβ”€β”€ generated/adapters/        # ENHANCED: Add relationship metadata
β”‚   β”œβ”€β”€ python/v1/
β”‚   β”œβ”€β”€ typescript/v1/
β”‚   └── ...
β”‚
└── infrastructure/            # AUTO-GENERATED: From SDL
    β”œβ”€β”€ postgresql/
    β”‚   └── generated_from_sdl/  # Auto-generated DDL
    β”œβ”€β”€ prisma/
    β”‚   └── generated.schema.prisma  # Auto-generated from SDL
    └── neo4j/
        └── generated_constraints.cypher  # Auto-generated from SDL

Migration Phases

Phase 1: Proof of Concept (1-2 weeks)

Goal: Validate SDL codegen for one domain (league tiers)

Tasks:

  1. Create database/schemas/domain/sdl/v1/league_tier.graphql
  2. Write codegen script database/setup/generate_from_sdl.py
  3. Generate:
    • PostgreSQL DDL
    • Prisma schema
    • Neo4j constraints
    • Enhanced Pydantic models
  4. Test: Can you create League β†’ Tier relationships in all three systems?

Success Criteria:

  • Single SDL file generates working Postgres tables, Prisma models, and Neo4j constraints
  • Relationship consistency verified across all three

Phase 2: Core Entities (2-3 weeks)

Goal: Migrate core business entities to SDL

Entities to migrate:

  • Leagues + Tiers
  • Contracts + Terms
  • Teams + Players
  • Partnerships + Contacts

Keep as JSON Schema:

  • Form definitions (onboarding forms, questionnaires)
  • Simple enums and primitives
  • Validation-only schemas

Phase 3: Sport-Specific Models (3-4 weeks)

Goal: Add sport-specific entities (combat, racing, team sports)

Approach:

# Shared base
interface SportEvent {
  event_id: ID!
  date: DateTime!
  venue: String
}
 
# Sport-specific implementations
type CombatBout implements SportEvent {
  event_id: ID!
  date: DateTime!
  venue: String
  fighter_a: Fighter! @relationship(type: "FIGHTS_IN", direction: IN)
  fighter_b: Fighter! @relationship(type: "FIGHTS_IN", direction: IN)
  weight_class: String!
  method_of_victory: String
}
 
type RaceEvent implements SportEvent {
  event_id: ID!
  date: DateTime!
  venue: String
  track_length: Float!
  weather_conditions: String
  participants: [Driver!]! @relationship(type: "COMPETES_IN", direction: IN)
}

Phase 4: Full Automation (Ongoing)

Goal: One command regenerates everything

# Regenerate all adapters from SDL + JSON Schema
./scripts/regenerate_all_schemas.sh
 
# Internally calls:
python database/setup/generate_from_sdl.py    # SDL β†’ SQL/Prisma/Neo4j/Pydantic
python database/setup/generate_adapters.py    # JSON Schema β†’ Pydantic/TS/Zod

Implementation: SDL Codegen Script

Minimal Codegen Architecture

# database/setup/generate_from_sdl.py
 
from pathlib import Path
from typing import List
import graphql
 
SDL_ROOT = Path("database/schemas/domain/sdl/v1")
INFRA_ROOT = Path("database/schemas/infrastructure")
 
def parse_sdl(sdl_path: Path) -> graphql.DocumentNode:
    """Parse GraphQL SDL file."""
    return graphql.parse(sdl_path.read_text())
 
def generate_postgresql(schema: graphql.DocumentNode) -> str:
    """Generate PostgreSQL DDL from SDL."""
    ddl = []
    for definition in schema.definitions:
        if isinstance(definition, graphql.ObjectTypeDefinitionNode):
            ddl.append(generate_table(definition))
    return "\n\n".join(ddl)
 
def generate_table(type_def: graphql.ObjectTypeDefinitionNode) -> str:
    """Generate CREATE TABLE statement."""
    table_name = type_def.name.value.lower() + "s"  # Simple pluralization
    columns = []
 
    for field in type_def.fields:
        # Check for @id directive
        is_id = any(d.name.value == "id" for d in (field.directives or []))
        column = generate_column(field, is_id)
        columns.append(column)
 
    return f"CREATE TABLE {table_name} (\n  " + ",\n  ".join(columns) + "\n);"
 
def generate_column(field: graphql.FieldDefinitionNode, is_id: bool) -> str:
    """Generate column definition."""
    field_name = field.name.value
    field_type = graphql_to_sql_type(field.type)
 
    constraints = []
    if is_id:
        constraints.append("PRIMARY KEY")
    if isinstance(field.type, graphql.NonNullTypeNode):
        constraints.append("NOT NULL")
 
    return f"{field_name} {field_type} " + " ".join(constraints)
 
def graphql_to_sql_type(gql_type) -> str:
    """Map GraphQL type to SQL type."""
    # Unwrap NonNull and List types
    if isinstance(gql_type, graphql.NonNullTypeNode):
        gql_type = gql_type.type
    if isinstance(gql_type, graphql.ListTypeNode):
        return "JSONB"  # Store arrays as JSONB
 
    type_name = gql_type.name.value
 
    mapping = {
        "ID": "UUID",
        "String": "VARCHAR(255)",
        "Int": "INTEGER",
        "Float": "DOUBLE PRECISION",
        "Boolean": "BOOLEAN",
        "DateTime": "TIMESTAMPTZ",
    }
    return mapping.get(type_name, "TEXT")
 
# Similar generators for Prisma, Neo4j, Pydantic...

Generated Outputs

From SDL:

infrastructure/
β”œβ”€β”€ postgresql/
β”‚   └── generated/
β”‚       └── league_tier.sql          # βœ… Auto-generated
β”‚
β”œβ”€β”€ prisma/
β”‚   └── generated/
β”‚       └── league_tier.prisma       # βœ… Auto-generated
β”‚
└── neo4j/
    └── generated/
        └── league_tier_constraints.cypher  # βœ… Auto-generated

Decision Framework

When to Use SDL vs JSON Schema

Use CaseUse SDLUse JSON Schema
Entities with relationships (Leagues, Teams, Contracts)βœ… SDL❌
Graph queries (Neo4j relationships)βœ… SDL❌
Cross-database consistency (Postgres + Neo4j + Prisma)βœ… SDL⚠️ Possible but harder
Form definitions (onboarding forms, questionnaires)βŒβœ… JSON Schema
Simple types/enums (tier names, statuses)⚠️ Eitherβœ… JSON Schema (easier)
API request/response validation⚠️ Eitherβœ… JSON Schema
AI-guided transformations⚠️ Eitherβœ… JSON Schema (better tooling)

Hybrid Strategy

Use SDL for:

  • Entities (League, Team, Player, Contract)
  • Relationships (League -[HAS_TIER]-> Tier)
  • Multi-database consistency (Postgres + Neo4j)

Keep JSON Schema for:

  • Forms and questionnaires
  • API contracts
  • Validation rules
  • Simple types (enums, primitives)

Effort Estimate

Phase 1: Proof of Concept (1-2 weeks)

  • Effort: 20-30 hours
  • Deliverable: Working codegen for one domain (league tiers)
  • Risk: Low (doesn't break existing system)

Phase 2: Core Entities (2-3 weeks)

  • Effort: 40-60 hours
  • Deliverable: Leagues, Contracts, Teams, Partnerships in SDL
  • Risk: Medium (parallel to existing system, migration needed)

Phase 3: Sport-Specific Models (3-4 weeks)

  • Effort: 60-80 hours
  • Deliverable: Combat, Racing, Team sports in SDL
  • Risk: Medium (complex domain modeling)

Total: 8-12 weeks (120-170 hours)


Recommendation

βœ… Yes, adopt SDL β€” but incrementally

Why:

  1. You already have the foundation: Multi-target codegen, version management, separation of concerns
  2. SDL fills critical gaps: First-class relationships, Neo4j integration, cross-database consistency
  3. Low risk: Layer SDL on top of existing JSON Schema (hybrid approach)
  4. High ROI: One SDL file β†’ Postgres + Prisma + Neo4j + Pydantic + TypeScript

Start Small

  1. This week: Create league_tier.graphql and basic codegen script
  2. Next week: Generate Postgres DDL + Prisma schema + Neo4j constraints
  3. Validate: Can you create League β†’ Tier relationships in all three databases?
  4. Iterate: Add more entities once the pattern works

Key Principle

Keep what works (JSON Schema for forms, validation), add what's missing (SDL for entities, relationships).


Next Steps

  1. Review this analysis β€” Does the hybrid approach make sense for your use case?
  2. Prototype SDL codegen β€” Create generate_from_sdl.py for Postgres + Prisma + Neo4j
  3. Pick one domain β€” Start with Leagues + Tiers (well-understood, critical to your business)
  4. Test end-to-end β€” Can you create relationships in all three systems from one SDL file?
  5. Decide on migration β€” If POC succeeds, plan phased migration of core entities

Questions to answer before proceeding:

  1. How critical is Neo4j graph querying to your roadmap? (High = prioritize SDL)
  2. How often do your entity relationships change? (Often = SDL pays off quickly)
  3. Do you have bandwidth for 1-2 weeks of POC work? (If yes, start this week)
  4. Are you comfortable with hybrid SDL + JSON Schema? (Or prefer single system?)

Let me know your thoughts and I can help with the POC implementation!

Platform

Documentation

Community

Support

partnership@altsportsdata.comdev@altsportsleagues.ai

2025 Β© AltSportsLeagues.ai. Powered by AI-driven sports business intelligence.

πŸ€– AI-Enhancedβ€’πŸ“Š Data-Drivenβ€’βš‘ Real-Time