Architecture
Schema Consolidation Plan

Source: data_layer/docs/SCHEMA_CONSOLIDATION_PLAN.md

Schema Consolidation Plan

Date: 2025-01-14

Current Problem

We have redundant and confusing schema organization:

database/
β”œβ”€β”€ output-styles/schemas/schemas/     # πŸ”΄ Double nested, 201 seed files
β”‚   β”œβ”€β”€ seeds/                          # Should be consolidated
β”‚   └── domain/v1/seeds/                # More seeds here too
β”‚
└── schemas/                            # βœ… Should be canonical location
    β”œβ”€β”€ domain/v1/                      # Canonical schemas
    β”œβ”€β”€ seeds/                          # Where seeds should live
    └── generated/                      # Generated adapters

Issues Identified

  1. Double Nesting: schemas/schemas/ is redundant
  2. 201+ Seed Files: Scattered across multiple directories
  3. Duplicate Consolidation Docs: Multiple cleanup attempts documented
  4. Path Confusion: Import paths reference multiple locations

Proposed Structure

database/
β”œβ”€β”€ schemas/                           # βœ… CANONICAL LOCATION
β”‚   β”œβ”€β”€ domain/                        # Business domain schemas
β”‚   β”‚   └── v1/                        # Version 1 schemas
β”‚   β”‚       β”œβ”€β”€ analysis/              # Analysis schemas
β”‚   β”‚       β”œβ”€β”€ archetypes/            # Sport archetypes
β”‚   β”‚       β”œβ”€β”€ combat/                # Combat sports
β”‚   β”‚       β”œβ”€β”€ contracts/             # Contract schemas
β”‚   β”‚       β”œβ”€β”€ racing/                # Racing sports
β”‚   β”‚       └── ...                    # Other domains
β”‚   β”‚
β”‚   β”œβ”€β”€ seeds/                         # πŸ“¦ ALL SEEDS CONSOLIDATED HERE
β”‚   β”‚   β”œβ”€β”€ leagues/                   # League seed data
β”‚   β”‚   β”œβ”€β”€ questionnaires/            # Questionnaire seeds
β”‚   β”‚   β”œβ”€β”€ schemas/                   # Schema seeds
β”‚   β”‚   β”œβ”€β”€ samples/                   # Sample datasets
β”‚   β”‚   └── synthetic_email_seeds/     # Email seeds
β”‚   β”‚
β”‚   β”œβ”€β”€ generated/                     # Generated adapters (don't touch)
β”‚   β”‚   └── adapters/
β”‚   β”‚       β”œβ”€β”€ python/v1/
β”‚   β”‚       β”œβ”€β”€ typescript/v1/
β”‚   β”‚       └── drizzle/v1/
β”‚   β”‚
β”‚   β”œβ”€β”€ _archive/                      # πŸ—‘οΈ OLD STUFF (for reference)
β”‚   β”‚   └── 2025-01-14-output-styles/  # Archived today
β”‚   β”‚
β”‚   β”œβ”€β”€ registry.json                  # Schema registry
β”‚   └── README.md                      # Documentation
β”‚
└── output-styles/                     # πŸ—‘οΈ TO BE ARCHIVED
    └── schemas/                       # Will move to _archive/

Consolidation Steps

Phase 1: Backup Current State

# Create backup
cd database
cp -r output-styles/schemas _backup_$(date +%Y%m%d_%H%M%S)

Phase 2: Move Seed Files

# Consolidate all seeds into one location
mkdir -p database/schemas/seeds/{leagues,questionnaires,schemas,samples,synthetic_email_seeds}
 
# Move seeds from output-styles/schemas/schemas/seeds/
mv database/output-styles/schemas/schemas/seeds/*.seed.json \
   database/schemas/seeds/
 
# Move organized seed subdirectories
mv database/output-styles/schemas/schemas/domain/v1/seeds/leagues/* \
   database/schemas/seeds/leagues/
 
mv database/output-styles/schemas/schemas/domain/v1/seeds/questionnaires/* \
   database/schemas/seeds/questionnaires/
 
mv database/output-styles/schemas/schemas/domain/v1/seeds/schemas/* \
   database/schemas/seeds/schemas/
 
mv database/output-styles/schemas/schemas/domain/v1/seeds/samples/* \
   database/schemas/seeds/samples/
 
mv database/output-styles/schemas/schemas/domain/v1/seeds/synthetic_email_seeds/* \
   database/schemas/seeds/synthetic_email_seeds/

Phase 3: Move Domain Schemas (if any new ones)

# Check for any schemas not already in domain/v1/
# Move only unique schemas, skip duplicates
cd database/output-styles/schemas/schemas/domain/v1/
 
# Compare with canonical location
diff -r . ../../../schemas/domain/v1/ --brief
 
# Manual review needed - identify unique files

Phase 4: Archive Old Structure

# Move entire output-styles to archive
mkdir -p database/schemas/_archive/2025-01-14-output-styles
mv database/output-styles/schemas \
   database/schemas/_archive/2025-01-14-output-styles/

Phase 5: Update Documentation

# Archive old consolidation docs
mv database/output-styles/schemas/SCHEMA_CONSOLIDATION*.md \
   database/schemas/_archive/2025-01-14-output-styles/
 
# Keep only the main README
# Update registry.json to reflect new paths

Phase 6: Update Import Paths

# Find all files importing from old location
grep -r "output-styles/schemas" --include="*.py" --include="*.ts"
 
# Update imports programmatically
find . -type f \( -name "*.py" -o -name "*.ts" \) -exec sed -i '' \
  's|output-styles/schemas/schemas/seeds|schemas/seeds|g' {} +
 
find . -type f \( -name "*.py" -o -name "*.ts" \) -exec sed -i '' \
  's|output-styles/schemas|schemas|g' {} +

Seed File Categorization

Based on the 201 seed files found, categorize them:

Keep in seeds/:

  • leagues/ - Real league seed data for testing
  • questionnaires/ - Sample questionnaires
  • samples/ - Small datasets (athletes, teams, games)
  • synthetic_email_seeds/ - Test email data

Move to _archive/:

  • Duplicate seeds (multiple versions of same league)
  • Old/outdated format seeds
  • Test seeds that are no longer relevant
  • Seeds with names like "original_snapshot-v1/v2/v3"

Delete:

  • Empty seed files
  • Malformed JSON files
  • Seeds that are exact duplicates

Registry Update

Update database/schemas/registry.json:

{
  "$schema": "https://json-schema.org/draft-07/schema",
  "description": "Schema Registry - Single source of truth",
  "version": "2.0.0",
  "schemas": {
    // Update all paths to remove "output-styles/schemas/schemas"
    // Point to "database/schemas/" as base
  },
  "seeds": {
    "description": "Seed data locations",
    "base_path": "database/schemas/seeds/",
    "categories": {
      "leagues": "seeds/leagues/",
      "questionnaires": "seeds/questionnaires/",
      "samples": "seeds/samples/",
      "schemas": "seeds/schemas/",
      "synthetic_emails": "seeds/synthetic_email_seeds/"
    }
  }
}

Success Criteria

  • All seed files consolidated into database/schemas/seeds/
  • No more output-styles/schemas/schemas/ double nesting
  • Old structure archived to _archive/
  • All import paths updated
  • Registry updated with new paths
  • Documentation cleaned up (keep only README.md)
  • No duplicate seed files
  • Tests still pass after migration

Rollback Plan

If something breaks:

# Restore from backup
cd database
rm -rf schemas/seeds/*
cp -r _backup_YYYYMMDD_HHMMSS/output-styles/schemas/schemas/seeds/* \
     schemas/seeds/

Next Steps After Consolidation

  1. Review and deduplicate seeds: Identify and remove duplicate seed files
  2. Update seed loader: Ensure seed loading scripts use new paths
  3. Update tests: Fix test fixtures using old paths
  4. Documentation: Update all docs to reference new structure
  5. CI/CD: Update any build scripts referencing old paths

Questions to Answer

  • Are there any unique schemas in output-styles/schemas/schemas/domain/v1/ not in database/schemas/domain/v1/?
  • Which seeds are actively used vs just historical?
  • Do we need all 201 seed files or can some be archived?
  • Are there import references in external repos/services?

Status: 🟑 Plan Created - Ready for Review Next: Execute Phase 1 (Backup) after approval

Platform

Documentation

Community

Support

partnership@altsportsdata.comdev@altsportsleagues.ai

2025 Β© AltSportsLeagues.ai. Powered by AI-driven sports business intelligence.

πŸ€– AI-Enhancedβ€’πŸ“Š Data-Drivenβ€’βš‘ Real-Time