Source: data_layer/docs/SCHEMA_SDL_MIGRATION_ANALYSIS.md

GraphQL SDL Migration Analysis

Current State Assessment

What You Have (✅ Strong Foundation)

Your current schema system is well-architected and already follows many best practices:

database/schemas/
├── domain/v1/              # ✅ Single Source of Truth (JSON Schema)
│   ├── classification/
│   ├── forms/
│   ├── match-types/
│   └── types/
│
├── generated/adapters/     # ✅ Multi-target code generation
│   ├── python/v1/          # Pydantic models
│   ├── typescript/v1/      # TS interfaces + Zod
│   ├── drizzle/v1/         # Drizzle ORM
│   ├── neo4j/v1/           # Graph stubs
│   ├── postgresql/v1/      # SQL DDL stubs
│   └── redshift/v1/        # Redshift DDL stubs
│
└── infrastructure/         # ✅ Infrastructure layer separation
    ├── postgresql/         # Raw SQL DDL
    └── prisma/            # Prisma schemas

Current Architecture Strengths

✅ Separation of Concerns
- Domain schemas (business logic) ≠ Infrastructure (database-specific)
- Clean boundary between canonical definitions and generated code
✅ Multi-Target Generation
- Already generating Python, TypeScript, Drizzle from JSON Schema
- ./scripts/regenerate_adapters.sh orchestrates everything
- Version management (v1, v2, etc.) built-in

✅ Clear Import Hierarchy

# Source of truth
database/schemas/domain/v1/tier.schema.json
 
# Generated adapter (use this)
from database.schemas.generated.adapters.python.v1.tier import TierClassification
 
# Extended with business logic (optional)
from app.models.tier_models import EnhancedTierClassification

✅ Database-Specific Infrastructure
- PostgreSQL DDL in infrastructure/postgresql/
- Prisma schema in infrastructure/prisma/
- Already separate from domain layer

Current Architecture Gaps

1. ❌ No First-Class Relationships

JSON Schema struggles with relationships:

// Current: tier.schema.json
{
  "properties": {
    "tier_name": {"type": "string"},
    "description": {"type": "string"}
  }
}
 
// Current: league.schema.json
{
  "properties": {
    "name": {"type": "string"},
    "tier": {"type": "string"}  // ❌ Just a string, no relationship metadata
  }
}

What's missing:

Cardinality (1:1, 1:many, many:many)
Relationship direction and names
Relationship properties (e.g., since_date on a partnership)
Bidirectional navigation

2. ❌ Limited Graph Capabilities

Your Neo4j adapter generates stubs, not real Cypher:

# database/schemas/generated/adapters/neo4j/v1/tier.py (current)
"""
Generated Neo4j adapter for tier.schema.json
TODO: Implement actual Cypher generation
"""

What you want:

// Auto-generated constraints and indexes
CREATE CONSTRAINT tier_name IF NOT EXISTS FOR (t:Tier) REQUIRE t.tier_name IS UNIQUE;
CREATE INDEX tier_description IF NOT EXISTS FOR (t:Tier) ON (t.description);
 
// Auto-generated relationship queries
MATCH (l:League)-[:HAS_TIER]->(t:Tier)
WHERE t.tier_name = $tier_name
RETURN l;

3. ❌ Manual SQL/Prisma Synchronization

Currently, you have two separate systems:

database/schemas/domain/v1/ → generates Pydantic/TS/Drizzle
database/schemas/infrastructure/postgresql/ → hand-written SQL DDL
database/schemas/infrastructure/prisma/schema.prisma → hand-written Prisma

The problem: Changes to domain schemas require manual updates to SQL/Prisma.

4. ❌ No Cross-Language Consistency for Relationships

When you define a relationship, you want it consistent across:

Postgres foreign keys
Neo4j edges
Prisma relations
Pydantic validators
TypeScript types

Current state: Each system has different relationship definitions.

GraphQL SDL/LinkML Proposal

What It Solves

GraphQL SDL (with directives) addresses all four gaps:

# database/schemas/domain/v1/sdl/league_tier.graphql
 
type Tier @id(field: "tier_name") @unique(fields: ["tier_name"]) {
  tier_name: String! @id
  description: String
  priority_score: Int
 
  # First-class relationship definition
  leagues: [League!]! @relationship(type: "HAS_TIER", direction: IN)
}
 
type League @id(field: "league_id") {
  league_id: ID! @id
  name: String!
  sport: String!
 
  # Relationship with cardinality and direction
  tier: Tier! @relationship(type: "HAS_TIER", direction: OUT)
  teams: [Team!]! @relationship(type: "HAS_TEAM", direction: OUT)
}
 
type Team @id(field: "team_id") {
  team_id: ID! @id
  name: String!
  league: League! @relationship(type: "HAS_TEAM", direction: IN)
}

What Gets Generated

From this single SDL file, generate:

1. PostgreSQL DDL

CREATE TABLE tiers (
  tier_name VARCHAR(100) PRIMARY KEY,
  description TEXT,
  priority_score INT
);
 
CREATE TABLE leagues (
  league_id UUID PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  sport VARCHAR(100) NOT NULL,
  tier_name VARCHAR(100) NOT NULL REFERENCES tiers(tier_name)
);
 
CREATE TABLE teams (
  team_id UUID PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  league_id UUID NOT NULL REFERENCES leagues(league_id)
);

2. Neo4j Cypher

// Constraints
CREATE CONSTRAINT tier_unique IF NOT EXISTS FOR (t:Tier) REQUIRE t.tier_name IS UNIQUE;
CREATE CONSTRAINT league_unique IF NOT EXISTS FOR (l:League) REQUIRE l.league_id IS UNIQUE;
 
// Indexes
CREATE INDEX tier_priority IF NOT EXISTS FOR (t:Tier) ON (t.priority_score);
 
// Sample relationship creation (in application code)
MATCH (l:League {league_id: $league_id})
MATCH (t:Tier {tier_name: $tier_name})
MERGE (l)-[:HAS_TIER]->(t);

3. Prisma Schema

model Tier {
  tier_name      String   @id
  description    String?
  priority_score Int?
  leagues        League[] @relation("HAS_TIER")
}
 
model League {
  league_id String @id @default(uuid())
  name      String
  sport     String
  tier_name String
  tier      Tier   @relation("HAS_TIER", fields: [tier_name], references: [tier_name])
  teams     Team[] @relation("HAS_TEAM")
}
 
model Team {
  team_id   String @id @default(uuid())
  name      String
  league_id String
  league    League @relation("HAS_TEAM", fields: [league_id], references: [league_id])
}

4. Pydantic Models (with relationships)

from pydantic import BaseModel, Field
from typing import List, Optional
 
class Tier(BaseModel):
    tier_name: str = Field(..., description="Primary identifier")
    description: Optional[str] = None
    priority_score: Optional[int] = None
 
    # Relationship metadata
    class Config:
        relationships = {
            "leagues": {"type": "League", "cardinality": "many", "direction": "in"}
        }
 
class League(BaseModel):
    league_id: str = Field(..., description="Primary identifier")
    name: str
    sport: str
    tier_name: str
 
    class Config:
        relationships = {
            "tier": {"type": "Tier", "cardinality": "one", "direction": "out"},
            "teams": {"type": "Team", "cardinality": "many", "direction": "out"}
        }

5. TypeScript Types + Zod

import { z } from 'zod';
 
export const TierSchema = z.object({
  tier_name: z.string(),
  description: z.string().optional(),
  priority_score: z.number().optional(),
});
 
export type Tier = z.infer<typeof TierSchema>;
 
export const LeagueSchema = z.object({
  league_id: z.string().uuid(),
  name: z.string(),
  sport: z.string(),
  tier_name: z.string(),
});
 
export type League = z.infer<typeof LeagueSchema>;
 
// Relationship metadata
export const LeagueRelationships = {
  tier: { type: 'Tier', cardinality: 'one' },
  teams: { type: 'Team', cardinality: 'many' },
} as const;

Migration Path: Hybrid Approach

Recommended Strategy: Incremental Hybrid

Don't throw away your current system. Layer SDL on top of it:

database/schemas/
├── domain/
│   ├── v1/                    # KEEP: JSON Schema for basic types
│   │   ├── types/             # Primitives, enums, simple objects
│   │   └── forms/             # Form definitions (JSON Schema works great here)
│   │
│   └── sdl/                   # NEW: GraphQL SDL for entities + relationships
│       └── v1/
│           ├── league_tier.graphql
│           ├── contracts.graphql
│           ├── combat_sports.graphql
│           └── team_sports.graphql
│
├── generated/adapters/        # ENHANCED: Add relationship metadata
│   ├── python/v1/
│   ├── typescript/v1/
│   └── ...
│
└── infrastructure/            # AUTO-GENERATED: From SDL
    ├── postgresql/
    │   └── generated_from_sdl/  # Auto-generated DDL
    ├── prisma/
    │   └── generated.schema.prisma  # Auto-generated from SDL
    └── neo4j/
        └── generated_constraints.cypher  # Auto-generated from SDL

Migration Phases

Phase 1: Proof of Concept (1-2 weeks)

Goal: Validate SDL codegen for one domain (league tiers)

Tasks:

Create database/schemas/domain/sdl/v1/league_tier.graphql
Write codegen script database/setup/generate_from_sdl.py
Generate:
- PostgreSQL DDL
- Prisma schema
- Neo4j constraints
- Enhanced Pydantic models
Test: Can you create League → Tier relationships in all three systems?

Success Criteria:

Single SDL file generates working Postgres tables, Prisma models, and Neo4j constraints
Relationship consistency verified across all three

Phase 2: Core Entities (2-3 weeks)

Goal: Migrate core business entities to SDL

Entities to migrate:

Leagues + Tiers
Contracts + Terms
Teams + Players
Partnerships + Contacts

Keep as JSON Schema:

Form definitions (onboarding forms, questionnaires)
Simple enums and primitives
Validation-only schemas

Phase 3: Sport-Specific Models (3-4 weeks)

Goal: Add sport-specific entities (combat, racing, team sports)

Approach:

# Shared base
interface SportEvent {
  event_id: ID!
  date: DateTime!
  venue: String
}
 
# Sport-specific implementations
type CombatBout implements SportEvent {
  event_id: ID!
  date: DateTime!
  venue: String
  fighter_a: Fighter! @relationship(type: "FIGHTS_IN", direction: IN)
  fighter_b: Fighter! @relationship(type: "FIGHTS_IN", direction: IN)
  weight_class: String!
  method_of_victory: String
}
 
type RaceEvent implements SportEvent {
  event_id: ID!
  date: DateTime!
  venue: String
  track_length: Float!
  weather_conditions: String
  participants: [Driver!]! @relationship(type: "COMPETES_IN", direction: IN)
}

Phase 4: Full Automation (Ongoing)

Goal: One command regenerates everything

# Regenerate all adapters from SDL + JSON Schema
./scripts/regenerate_all_schemas.sh
 
# Internally calls:
python database/setup/generate_from_sdl.py    # SDL → SQL/Prisma/Neo4j/Pydantic
python database/setup/generate_adapters.py    # JSON Schema → Pydantic/TS/Zod

Implementation: SDL Codegen Script

Minimal Codegen Architecture

# database/setup/generate_from_sdl.py
 
from pathlib import Path
from typing import List
import graphql
 
SDL_ROOT = Path("database/schemas/domain/sdl/v1")
INFRA_ROOT = Path("database/schemas/infrastructure")
 
def parse_sdl(sdl_path: Path) -> graphql.DocumentNode:
    """Parse GraphQL SDL file."""
    return graphql.parse(sdl_path.read_text())
 
def generate_postgresql(schema: graphql.DocumentNode) -> str:
    """Generate PostgreSQL DDL from SDL."""
    ddl = []
    for definition in schema.definitions:
        if isinstance(definition, graphql.ObjectTypeDefinitionNode):
            ddl.append(generate_table(definition))
    return "\n\n".join(ddl)
 
def generate_table(type_def: graphql.ObjectTypeDefinitionNode) -> str:
    """Generate CREATE TABLE statement."""
    table_name = type_def.name.value.lower() + "s"  # Simple pluralization
    columns = []
 
    for field in type_def.fields:
        # Check for @id directive
        is_id = any(d.name.value == "id" for d in (field.directives or []))
        column = generate_column(field, is_id)
        columns.append(column)
 
    return f"CREATE TABLE {table_name} (\n  " + ",\n  ".join(columns) + "\n);"
 
def generate_column(field: graphql.FieldDefinitionNode, is_id: bool) -> str:
    """Generate column definition."""
    field_name = field.name.value
    field_type = graphql_to_sql_type(field.type)
 
    constraints = []
    if is_id:
        constraints.append("PRIMARY KEY")
    if isinstance(field.type, graphql.NonNullTypeNode):
        constraints.append("NOT NULL")
 
    return f"{field_name} {field_type} " + " ".join(constraints)
 
def graphql_to_sql_type(gql_type) -> str:
    """Map GraphQL type to SQL type."""
    # Unwrap NonNull and List types
    if isinstance(gql_type, graphql.NonNullTypeNode):
        gql_type = gql_type.type
    if isinstance(gql_type, graphql.ListTypeNode):
        return "JSONB"  # Store arrays as JSONB
 
    type_name = gql_type.name.value
 
    mapping = {
        "ID": "UUID",
        "String": "VARCHAR(255)",
        "Int": "INTEGER",
        "Float": "DOUBLE PRECISION",
        "Boolean": "BOOLEAN",
        "DateTime": "TIMESTAMPTZ",
    }
    return mapping.get(type_name, "TEXT")
 
# Similar generators for Prisma, Neo4j, Pydantic...

Generated Outputs

From SDL:

infrastructure/
├── postgresql/
│   └── generated/
│       └── league_tier.sql          # ✅ Auto-generated
│
├── prisma/
│   └── generated/
│       └── league_tier.prisma       # ✅ Auto-generated
│
└── neo4j/
    └── generated/
        └── league_tier_constraints.cypher  # ✅ Auto-generated

Decision Framework

When to Use SDL vs JSON Schema

Use Case	Use SDL	Use JSON Schema
Entities with relationships (Leagues, Teams, Contracts)	✅ SDL	❌
Graph queries (Neo4j relationships)	✅ SDL	❌
Cross-database consistency (Postgres + Neo4j + Prisma)	✅ SDL	⚠️ Possible but harder
Form definitions (onboarding forms, questionnaires)	❌	✅ JSON Schema
Simple types/enums (tier names, statuses)	⚠️ Either	✅ JSON Schema (easier)
API request/response validation	⚠️ Either	✅ JSON Schema
AI-guided transformations	⚠️ Either	✅ JSON Schema (better tooling)

Hybrid Strategy

Use SDL for:

Entities (League, Team, Player, Contract)
Relationships (League -[HAS_TIER]-> Tier)
Multi-database consistency (Postgres + Neo4j)

Keep JSON Schema for:

Forms and questionnaires
API contracts
Validation rules
Simple types (enums, primitives)

Effort Estimate

Phase 1: Proof of Concept (1-2 weeks)

Effort: 20-30 hours
Deliverable: Working codegen for one domain (league tiers)
Risk: Low (doesn't break existing system)

Phase 2: Core Entities (2-3 weeks)

Effort: 40-60 hours
Deliverable: Leagues, Contracts, Teams, Partnerships in SDL
Risk: Medium (parallel to existing system, migration needed)

Phase 3: Sport-Specific Models (3-4 weeks)

Effort: 60-80 hours
Deliverable: Combat, Racing, Team sports in SDL
Risk: Medium (complex domain modeling)

Total: 8-12 weeks (120-170 hours)

Recommendation

✅ Yes, adopt SDL — but incrementally

Why:

You already have the foundation: Multi-target codegen, version management, separation of concerns
SDL fills critical gaps: First-class relationships, Neo4j integration, cross-database consistency
Low risk: Layer SDL on top of existing JSON Schema (hybrid approach)
High ROI: One SDL file → Postgres + Prisma + Neo4j + Pydantic + TypeScript

Start Small

This week: Create league_tier.graphql and basic codegen script
Next week: Generate Postgres DDL + Prisma schema + Neo4j constraints
Validate: Can you create League → Tier relationships in all three databases?
Iterate: Add more entities once the pattern works

Key Principle

Keep what works (JSON Schema for forms, validation), add what's missing (SDL for entities, relationships).

Next Steps

Review this analysis — Does the hybrid approach make sense for your use case?
Prototype SDL codegen — Create generate_from_sdl.py for Postgres + Prisma + Neo4j
Pick one domain — Start with Leagues + Tiers (well-understood, critical to your business)
Test end-to-end — Can you create relationships in all three systems from one SDL file?
Decide on migration — If POC succeeds, plan phased migration of core entities

Questions to answer before proceeding:

How critical is Neo4j graph querying to your roadmap? (High = prioritize SDL)
How often do your entity relationships change? (Often = SDL pays off quickly)
Do you have bandwidth for 1-2 weeks of POC work? (If yes, start this week)
Are you comfortable with hybrid SDL + JSON Schema? (Or prefer single system?)

Let me know your thoughts and I can help with the POC implementation!

Google Drive Sync Setup Guide Internal Infrastructure

GraphQL SDL Migration Analysis

Current State Assessment

What You Have (✅ Strong Foundation)

Current Architecture Strengths

Current Architecture Gaps

1. ❌ No First-Class Relationships

2. ❌ Limited Graph Capabilities

3. ❌ Manual SQL/Prisma Synchronization

4. ❌ No Cross-Language Consistency for Relationships

GraphQL SDL/LinkML Proposal

What It Solves

What Gets Generated

1. PostgreSQL DDL

2. Neo4j Cypher

3. Prisma Schema

4. Pydantic Models (with relationships)

5. TypeScript Types + Zod

Migration Path: Hybrid Approach

Recommended Strategy: Incremental Hybrid

Migration Phases

Phase 1: Proof of Concept (1-2 weeks)

Phase 2: Core Entities (2-3 weeks)

Phase 3: Sport-Specific Models (3-4 weeks)

Phase 4: Full Automation (Ongoing)

Implementation: SDL Codegen Script

Minimal Codegen Architecture

Generated Outputs

Decision Framework

When to Use SDL vs JSON Schema

Hybrid Strategy

Effort Estimate

Phase 1: Proof of Concept (1-2 weeks)

Phase 2: Core Entities (2-3 weeks)

Phase 3: Sport-Specific Models (3-4 weeks)

Total: 8-12 weeks (120-170 hours)

Recommendation

✅ Yes, adopt SDL — but incrementally

Start Small

Key Principle

Next Steps

Platform

Documentation

Community

Support