Source: data_layer/docs/SCHEMA_MAPPING.md

Schema Mapping: JSON Schema ↔ Neo4j Cypher

Architecture Decision

Decision: Maintain separate JSON Schema and Neo4j Cypher schema files (no GraphQL, no x-graph extensions)

Rationale:

JSON Schema = Validation layer (API contracts, data validation)
Neo4j Cypher = Graph topology layer (relationships, constraints, indexes)
GraphQL = Query language (not needed for internal AI-driven workflows)
x-graph extensions = Non-standard hack (pollutes schemas, no tooling support)

Current Architecture

database/
├── schemas/domain/v1/*.json     # JSON Schema (validation)
│   └── Generated → Pydantic, TypeScript, Drizzle
│
└── sql/neo4j_*.cypher           # Neo4j schema (graph)
    └── Applied → Neo4j database

Manual sync maintained via documentation (this file)

Entity Mapping

League Entity

JSON Schema: database/schemas/domain/v1/league_payload_schema.json

{
  "type": "object",
  "properties": {
    "league_id": {"type": "string", "pattern": "^[A-Z0-9_-]{3,40}$"},
    "name": {"type": "string", "minLength": 2},
    "sport": {"type": "string"},
    "tier": {"type": "string", "enum": ["T1", "T2", "T3", "T4"]},
    "verified": {"type": "boolean"}
  },
  "required": ["league_id", "name", "sport"]
}

Neo4j Cypher: database/sql/neo4j_comprehensive_schema.cypher

// Constraints
CREATE CONSTRAINT league_id_unique IF NOT EXISTS
  FOR (l:League) REQUIRE l.id IS UNIQUE;
 
// Indexes
CREATE INDEX league_name_idx IF NOT EXISTS
  FOR (l:League) ON (l.name);
 
// Node properties
CREATE (league:League {
  id: 'league_id',          // maps to league_id in JSON
  name: 'string',           // maps to name in JSON
  sport: 'string',          // maps to sport in JSON
  tier: 'T1|T2|T3|T4',     // maps to tier in JSON
  verified: boolean         // maps to verified in JSON
})

Mapping:

JSON Schema Property	Neo4j Property	Notes
`league_id`	`id`	Neo4j uses `id` (shorter)
`name`	`name`	1:1 mapping
`sport`	`sport`	1:1 mapping
`tier`	`tier`	Enum validated in JSON, string in Neo4j
`verified`	`verified`	1:1 mapping

Team Entity

JSON Schema: database/schemas/domain/v1/team_schema.json (if exists)

{
  "type": "object",
  "properties": {
    "team_id": {"type": "string"},
    "name": {"type": "string", "minLength": 2},
    "league_id": {"type": "string"}
  }
}

Neo4j Cypher:

CREATE CONSTRAINT team_id_unique IF NOT EXISTS
  FOR (t:Team) REQUIRE t.id IS UNIQUE;
 
CREATE (team:Team {
  id: 'team_id',
  name: 'string'
})
 
// Relationship to League
CREATE (team)-[:COMPETES_IN {season: 'string'}]->(league:League)

Mapping:

JSON Schema	Neo4j Graph	Notes
`team_id`	`Team.id`	Property mapping
`name`	`Team.name`	Property mapping
`league_id` (foreign key)	`[:COMPETES_IN]->(League)`	FK becomes relationship

Player Entity

JSON Schema: database/schemas/domain/v1/player_schema.json (if exists)

{
  "type": "object",
  "properties": {
    "player_id": {"type": "string"},
    "full_name": {"type": "string"},
    "team_id": {"type": "string"}
  }
}

Neo4j Cypher:

CREATE CONSTRAINT player_id_unique IF NOT EXISTS
  FOR (p:Player) REQUIRE p.id IS UNIQUE;
 
CREATE (player:Player {
  id: 'player_id',
  fullName: 'string'
})
 
CREATE (player)-[:PLAYS_FOR {since: date()}]->(team:Team)

Mapping:

JSON Schema	Neo4j Graph	Notes
`player_id`	`Player.id`	Property mapping
`full_name`	`Player.fullName`	Camelcase in Neo4j
`team_id` (FK)	`[:PLAYS_FOR]->(Team)`	FK becomes relationship

Relationship Mapping Patterns

Pattern 1: Foreign Key → Relationship

JSON Schema (relational thinking):

{
  "team": {
    "team_id": "TEAM_001",
    "league_id": "LEAGUE_001"  // Foreign key
  }
}

Neo4j (graph thinking):

(team:Team {id: 'TEAM_001'})-[:COMPETES_IN]->(league:League {id: 'LEAGUE_001'})

Pattern 2: Nested Objects → Relationship Properties

JSON Schema:

{
  "player": {
    "player_id": "PLR_001",
    "contract": {
      "start_date": "2025-01-01",
      "end_date": "2026-12-31"
    }
  }
}

Neo4j:

(player:Player)-[:HAS_CONTRACT {
  startDate: date('2025-01-01'),
  endDate: date('2026-12-31')
}]->(contract:Contract)

Pattern 3: Array of IDs → Multiple Relationships

JSON Schema:

{
  "league": {
    "league_id": "LEAGUE_001",
    "team_ids": ["TEAM_001", "TEAM_002", "TEAM_003"]
  }
}

Neo4j:

(league:League {id: 'LEAGUE_001'})
  <-[:COMPETES_IN]-(team1:Team {id: 'TEAM_001'})
  <-[:COMPETES_IN]-(team2:Team {id: 'TEAM_002'})
  <-[:COMPETES_IN]-(team3:Team {id: 'TEAM_003'})

Type Mapping

JSON Schema Type	Neo4j Type	Notes
`string`	`STRING`	Direct mapping
`number`	`INTEGER` or `FLOAT`	Depends on use case
`boolean`	`BOOLEAN`	Direct mapping
`string` (format: date)	`DATE`	Use `date()` function
`string` (format: date-time)	`DATETIME`	Use `datetime()` function
`array`	`LIST`	Neo4j native list type
`object`	Node with relationship	Nested object → separate node
`enum`	`STRING` with constraint	Validation in JSON, string in Neo4j

Validation Responsibilities

JSON Schema Validates:

✅ Data types (string, number, boolean)
✅ Required fields
✅ Format constraints (email, URL, date)
✅ Pattern matching (regex)
✅ Min/max length
✅ Enum values
✅ Nested object structures

Neo4j Validates:

✅ Uniqueness constraints (node IDs)
✅ Existence constraints (required properties)
✅ Relationship cardinality
✅ Graph topology (valid relationships)
✅ Index performance

Principle: JSON Schema guards API boundaries, Neo4j guards graph integrity.

Sync Workflow

When Creating New Entity:

Define JSON Schema first (database/schemas/domain/v1/new_entity.schema.json)
- Define properties, types, validation rules
- Generate Pydantic models: ./scripts/regenerate_adapters.sh
Define Neo4j schema (database/sql/neo4j_comprehensive_schema.cypher)
- Add constraints for unique IDs
- Add indexes for common queries
- Define node label and properties
- Define relationships to other entities
Document mapping (update this file)
- Add entity to "Entity Mapping" section
- Document property name differences
- Document relationship patterns
Validate sync (manual check)
- Every JSON Schema property has corresponding Neo4j property OR relationship
- Every Neo4j node has corresponding JSON Schema
- Foreign keys in JSON → relationships in Neo4j

When Modifying Entity:

Update JSON Schema → regenerate adapters
Update Neo4j Cypher → apply migration
Update this mapping doc → keep documentation in sync
Run validation script (see below)

Validation Script

Location: database/scripts/validate_schema_sync.py (to be created)

Purpose: Automated check that JSON Schema and Neo4j Cypher stay in sync

Checks:

Every JSON Schema entity has Neo4j node definition
Every Neo4j node has JSON Schema definition
Property names match (with documented exceptions)
Foreign keys map to relationships
Required fields in JSON have existence constraints in Neo4j

Usage:

python database/scripts/validate_schema_sync.py
# Output: PASS or list of mismatches

Examples: Common Scenarios

Scenario 1: Adding New Property

Step 1: Update JSON Schema

{
  "properties": {
    "league_id": {"type": "string"},
    "name": {"type": "string"},
    "website_url": {"type": "string", "format": "uri"}  // NEW
  }
}

Step 2: Update Neo4j Cypher

// Add property to node template
CREATE (league:League {
  id: 'league_id',
  name: 'string',
  websiteUrl: 'string'  // NEW (camelCase)
})

Step 3: Document in this file

| `website_url` | `websiteUrl` | Snake case → camel case |

Scenario 2: Adding New Relationship

Step 1: Update JSON Schema (optional - may be implicit via foreign key)

{
  "league": {
    "sportsbook_ids": ["array", "of", "ids"]  // NEW
  }
}

Step 2: Define Neo4j relationship

CREATE (league:League)-[:PARTNERS_WITH {
  since: date('2025-01-01')
}]->(sportsbook:Sportsbook)

Step 3: Document pattern

### Pattern: Many-to-Many via Array
JSON: `sportsbook_ids` array
Neo4j: "`[:PARTNERS_WITH]` relationships"

Future Enhancements

Option 1: Unified YAML Spec (Long-term)

Proposed: Single source of truth generating both JSON Schema and Cypher

# unified.graph.yaml
entities:
  League:
    properties:
      id: {type: string, unique: true, indexed: true}
      name: {type: string, required: true, indexed: true}
      sport: {type: string, required: true}
 
    relationships:
      teams:
        type: COMPETES_IN
        direction: in
        from: Team

Generated:

league.schema.json (JSON Schema)
league.cypher (Neo4j constraints + indexes)
league.py (Pydantic models)
league.ts (TypeScript types)

Option 2: Validation Automation

Script: database/scripts/validate_schema_sync.py

Features:

Parse JSON Schema files
Parse Cypher schema files
Compare entity definitions
Report mismatches
Integrate into CI/CD pipeline

Key Principles

JSON Schema = Validation → API contract, data validation, type safety
Neo4j Cypher = Graph → Relationships, constraints, topology
Keep them separate → Different concerns, different tools
Document mapping → This file is the bridge
Automate checks → Validation script prevents drift

Questions?

"Should I add x-graph extensions to JSON Schema?" → No, keep concerns separate
"Should I use GraphQL?" → No, not needed for internal AI workflows
"How do I keep them in sync?" → Follow this doc + validation script
"Can I generate one from the other?" → Future enhancement (unified YAML spec)

Schema Mapping: JSON Schema ↔ Neo4j Cypher

Architecture Decision

Current Architecture

Entity Mapping

League Entity

Team Entity

Player Entity

Relationship Mapping Patterns

Pattern 1: Foreign Key → Relationship

Pattern 2: Nested Objects → Relationship Properties

Pattern 3: Array of IDs → Multiple Relationships

Type Mapping

Validation Responsibilities

JSON Schema Validates:

Neo4j Validates:

Sync Workflow

When Creating New Entity:

When Modifying Entity:

Validation Script

Examples: Common Scenarios

Scenario 1: Adding New Property

Scenario 2: Adding New Relationship

Future Enhancements

Option 1: Unified YAML Spec (Long-term)

Option 2: Validation Automation

Key Principles

Questions?

Related Documentation

Platform

Documentation

Community

Support