Source: data_layer/docs/SCHEMA_CONSOLIDATION_PLAN.md
Schema Consolidation Plan
Date: 2025-01-14
Current Problem
We have redundant and confusing schema organization:
database/
βββ output-styles/schemas/schemas/ # π΄ Double nested, 201 seed files
β βββ seeds/ # Should be consolidated
β βββ domain/v1/seeds/ # More seeds here too
β
βββ schemas/ # β
Should be canonical location
βββ domain/v1/ # Canonical schemas
βββ seeds/ # Where seeds should live
βββ generated/ # Generated adaptersIssues Identified
- Double Nesting:
schemas/schemas/is redundant - 201+ Seed Files: Scattered across multiple directories
- Duplicate Consolidation Docs: Multiple cleanup attempts documented
- Path Confusion: Import paths reference multiple locations
Proposed Structure
database/
βββ schemas/ # β
CANONICAL LOCATION
β βββ domain/ # Business domain schemas
β β βββ v1/ # Version 1 schemas
β β βββ analysis/ # Analysis schemas
β β βββ archetypes/ # Sport archetypes
β β βββ combat/ # Combat sports
β β βββ contracts/ # Contract schemas
β β βββ racing/ # Racing sports
β β βββ ... # Other domains
β β
β βββ seeds/ # π¦ ALL SEEDS CONSOLIDATED HERE
β β βββ leagues/ # League seed data
β β βββ questionnaires/ # Questionnaire seeds
β β βββ schemas/ # Schema seeds
β β βββ samples/ # Sample datasets
β β βββ synthetic_email_seeds/ # Email seeds
β β
β βββ generated/ # Generated adapters (don't touch)
β β βββ adapters/
β β βββ python/v1/
β β βββ typescript/v1/
β β βββ drizzle/v1/
β β
β βββ _archive/ # ποΈ OLD STUFF (for reference)
β β βββ 2025-01-14-output-styles/ # Archived today
β β
β βββ registry.json # Schema registry
β βββ README.md # Documentation
β
βββ output-styles/ # ποΈ TO BE ARCHIVED
βββ schemas/ # Will move to _archive/Consolidation Steps
Phase 1: Backup Current State
# Create backup
cd database
cp -r output-styles/schemas _backup_$(date +%Y%m%d_%H%M%S)Phase 2: Move Seed Files
# Consolidate all seeds into one location
mkdir -p database/schemas/seeds/{leagues,questionnaires,schemas,samples,synthetic_email_seeds}
# Move seeds from output-styles/schemas/schemas/seeds/
mv database/output-styles/schemas/schemas/seeds/*.seed.json \
database/schemas/seeds/
# Move organized seed subdirectories
mv database/output-styles/schemas/schemas/domain/v1/seeds/leagues/* \
database/schemas/seeds/leagues/
mv database/output-styles/schemas/schemas/domain/v1/seeds/questionnaires/* \
database/schemas/seeds/questionnaires/
mv database/output-styles/schemas/schemas/domain/v1/seeds/schemas/* \
database/schemas/seeds/schemas/
mv database/output-styles/schemas/schemas/domain/v1/seeds/samples/* \
database/schemas/seeds/samples/
mv database/output-styles/schemas/schemas/domain/v1/seeds/synthetic_email_seeds/* \
database/schemas/seeds/synthetic_email_seeds/Phase 3: Move Domain Schemas (if any new ones)
# Check for any schemas not already in domain/v1/
# Move only unique schemas, skip duplicates
cd database/output-styles/schemas/schemas/domain/v1/
# Compare with canonical location
diff -r . ../../../schemas/domain/v1/ --brief
# Manual review needed - identify unique filesPhase 4: Archive Old Structure
# Move entire output-styles to archive
mkdir -p database/schemas/_archive/2025-01-14-output-styles
mv database/output-styles/schemas \
database/schemas/_archive/2025-01-14-output-styles/Phase 5: Update Documentation
# Archive old consolidation docs
mv database/output-styles/schemas/SCHEMA_CONSOLIDATION*.md \
database/schemas/_archive/2025-01-14-output-styles/
# Keep only the main README
# Update registry.json to reflect new pathsPhase 6: Update Import Paths
# Find all files importing from old location
grep -r "output-styles/schemas" --include="*.py" --include="*.ts"
# Update imports programmatically
find . -type f \( -name "*.py" -o -name "*.ts" \) -exec sed -i '' \
's|output-styles/schemas/schemas/seeds|schemas/seeds|g' {} +
find . -type f \( -name "*.py" -o -name "*.ts" \) -exec sed -i '' \
's|output-styles/schemas|schemas|g' {} +Seed File Categorization
Based on the 201 seed files found, categorize them:
Keep in seeds/:
- leagues/ - Real league seed data for testing
- questionnaires/ - Sample questionnaires
- samples/ - Small datasets (athletes, teams, games)
- synthetic_email_seeds/ - Test email data
Move to _archive/:
- Duplicate seeds (multiple versions of same league)
- Old/outdated format seeds
- Test seeds that are no longer relevant
- Seeds with names like "original_snapshot-v1/v2/v3"
Delete:
- Empty seed files
- Malformed JSON files
- Seeds that are exact duplicates
Registry Update
Update database/schemas/registry.json:
{
"$schema": "https://json-schema.org/draft-07/schema",
"description": "Schema Registry - Single source of truth",
"version": "2.0.0",
"schemas": {
// Update all paths to remove "output-styles/schemas/schemas"
// Point to "database/schemas/" as base
},
"seeds": {
"description": "Seed data locations",
"base_path": "database/schemas/seeds/",
"categories": {
"leagues": "seeds/leagues/",
"questionnaires": "seeds/questionnaires/",
"samples": "seeds/samples/",
"schemas": "seeds/schemas/",
"synthetic_emails": "seeds/synthetic_email_seeds/"
}
}
}Success Criteria
- All seed files consolidated into
database/schemas/seeds/ - No more
output-styles/schemas/schemas/double nesting - Old structure archived to
_archive/ - All import paths updated
- Registry updated with new paths
- Documentation cleaned up (keep only README.md)
- No duplicate seed files
- Tests still pass after migration
Rollback Plan
If something breaks:
# Restore from backup
cd database
rm -rf schemas/seeds/*
cp -r _backup_YYYYMMDD_HHMMSS/output-styles/schemas/schemas/seeds/* \
schemas/seeds/Next Steps After Consolidation
- Review and deduplicate seeds: Identify and remove duplicate seed files
- Update seed loader: Ensure seed loading scripts use new paths
- Update tests: Fix test fixtures using old paths
- Documentation: Update all docs to reference new structure
- CI/CD: Update any build scripts referencing old paths
Questions to Answer
- Are there any unique schemas in
output-styles/schemas/schemas/domain/v1/not indatabase/schemas/domain/v1/? - Which seeds are actively used vs just historical?
- Do we need all 201 seed files or can some be archived?
- Are there import references in external repos/services?
Status: π‘ Plan Created - Ready for Review Next: Execute Phase 1 (Backup) after approval