Architecture
Schema & Seed Consolidation - Executive Summary

Source: data_layer/docs/CONSOLIDATION_SUMMARY.md

Schema & Seed Consolidation - Executive Summary

Status: βœ… COMPLETE
Date: January 11, 2025
Files Processed: 327 seed files + 20+ schema files

What Was Done

Successfully consolidated scattered schema and seed files from multiple directories into two centralized locations:

1. Schema Consolidation: kb_catalog/schemas/ β†’ schemas/

Files Moved:

  • βœ… Type definitions β†’ schemas/types/ (types.py, types.js, sport_types.py, user-roles.ts)
  • βœ… Archetypes β†’ schemas/archetypes/
  • βœ… API schemas β†’ schemas/integrations/api/
  • βœ… Forms β†’ schemas/integrations/forms/
  • βœ… Output schemas β†’ schemas/integrations/output/
  • βœ… ASD integration β†’ schemas/integrations/asd/
  • βœ… Prisma docs β†’ schemas/infrastructure/prisma/
  • βœ… Workflows β†’ schemas/workflows/

2. Seed Consolidation: output-styles/schemas/ β†’ few_shot_examples_training_data/data/

Files Converted (JSON β†’ JSONL):

  • βœ… 38 league examples β†’ league_examples.jsonl
  • βœ… 53 questionnaires β†’ questionnaires.jsonl
  • βœ… 75 schema definitions β†’ schema_definitions.jsonl
  • βœ… 5 sample datasets β†’ sample_data.jsonl
  • βœ… 156 legacy seeds β†’ legacy_seeds.jsonl

Total: 327 examples now in searchable, indexed JSONL format

New Structure

database/
β”œβ”€β”€ schemas/                                    # 🎯 ALL SCHEMAS HERE
β”‚   β”œβ”€β”€ types/                                 # Type definitions
β”‚   β”œβ”€β”€ archetypes/                            # Sport archetypes
β”‚   β”œβ”€β”€ workflows/                             # Workflow schemas
β”‚   β”œβ”€β”€ integrations/                          # API, forms, outputs, ASD
β”‚   β”œβ”€β”€ infrastructure/                        # Prisma, database
β”‚   └── domain/v1/                            # Business domains
β”‚
β”œβ”€β”€ few_shot_examples_training_data/           # 🎯 ALL SEED/TRAINING DATA HERE
β”‚   └── data/
β”‚       β”œβ”€β”€ league_examples.jsonl             # 38 examples
β”‚       β”œβ”€β”€ questionnaires.jsonl              # 53 examples
β”‚       β”œβ”€β”€ schema_definitions.jsonl          # 75 examples
β”‚       β”œβ”€β”€ sample_data.jsonl                 # 5 examples
β”‚       β”œβ”€β”€ legacy_seeds.jsonl                # 156 examples
β”‚       └── ...                               # Other training files
β”‚
└── kb_catalog/schemas/                        # Knowledge base metadata ONLY
    β”œβ”€β”€ mappings/                              # (business context, not schemas)
    β”œβ”€β”€ metadata/
    └── usage-guides/

Files Created

  1. βœ… scripts/consolidate_seeds.py - Conversion script (327 files processed)
  2. βœ… scripts/cleanup_consolidated_dirs.sh - Automated cleanup script
  3. βœ… docs/schema.consolidation-complete.md - Detailed documentation
  4. βœ… CONSOLIDATION_CLEANUP_GUIDE.md - Cleanup instructions
  5. βœ… schemas/types/README.md - Type definitions guide
  6. βœ… Updated schemas/README.md - Added consolidation section

Benefits Achieved

🎯 Single Source of Truth

  • All schemas in one place (schemas/)
  • All training data in one place (few_shot_examples_training_data/)

πŸ“Š Better Organization

  • Type definitions grouped logically
  • Integration schemas separated from domain schemas
  • Seed data in searchable, indexed format

⚑ Performance Improvements

  • JSONL files can be database-indexed
  • Fast semantic search for training examples
  • No duplicate files across directories

πŸ”§ Easier Maintenance

  • Update JSONL β†’ run seed script β†’ automatic sync
  • Clear separation: schemas vs. documentation
  • Version control friendly (JSONL diffs cleanly)

πŸ“ˆ Consistency

  • All examples have standardized metadata
  • Quality scoring included
  • Usage tracking enabled

Next Steps

Immediate Actions

  1. Verify the consolidation works:

    cd database/
    python -c "from schemas.types.types import *; print('βœ… Imports working')"
    ls -lh few_shot_examples_training_data/data/*.jsonl
  2. Seed the database (optional, for training system):

    uv run python scripts/seed.examples.py
  3. Update any imports in your codebase:

    • Old: from kb_catalog.schemas.types import X
    • New: from schemas.types.types import X

Optional Cleanup

After verifying everything works (1-2 days):

# Automated cleanup with backup
./scripts/cleanup_consolidated_dirs.sh
 
# OR manual verification first
cat CONSOLIDATION_CLEANUP_GUIDE.md

The cleanup script will:

  • βœ… Create timestamped backup
  • βœ… Remove old directories
  • βœ… Provide rollback instructions

Recommended Timeline

  • Day 1 (Today): βœ… Consolidation complete
  • Day 2-3: Test imports and seed retrieval
  • Week 1: Update import references in code
  • Week 2: Run cleanup script after verification
  • Week 4: Remove backup if no issues

Quick Reference

Import Paths Changed

Old PathNew Path
kb_catalog.schemas.typesschemas.types.types
kb_catalog.schemas.sport_typesschemas.types.sport_types
kb_catalog/schemas/api/schemas/integrations/api/

Seed File Locations

CategoryFileExamples
Leaguesleague_examples.jsonl38
Questionnairesquestionnaires.jsonl53
Schemasschema_definitions.jsonl75
Samplessample_data.jsonl5
Legacylegacy_seeds.jsonl156

Documentation

  • πŸ“– Full details: docs/schema.consolidation-complete.md
  • 🧹 Cleanup guide: CONSOLIDATION_CLEANUP_GUIDE.md
  • πŸ“š Schema docs: schemas/README.md
  • πŸŽ“ Training data: few_shot_examples_training_data/README.md

Verification Commands

cd /Users/kbselander/Developer/Notebook/mcp-servers/servers/mcp-server-altsportsleagues.ai/2.1-cloud-run-docker-mcp/database
 
# Test Python imports
python -c "from schemas.types.types import *"
 
# Count JSONL examples
wc -l few_shot_examples_training_data/data/*.jsonl
 
# Validate JSONL format
python -c "
import json
from pathlib import Path
for f in Path('few_shot_examples_training_data/data').glob('*.jsonl'):
    with open(f) as file:
        for line in file:
            json.loads(line)
    print(f'{f.name}: βœ“ Valid')
"
 
# Check new schema structure
tree -L 3 schemas/ -I '__pycache__|*.pyc'

Rollback Plan

If needed, restore from backup:

# The cleanup script creates backups automatically
# Manual backup command:
mkdir -p _manual_backup
cp -r kb_catalog/schemas/ _manual_backup/
cp -r output-styles/schemas/ _manual_backup/
 
# Restore if needed
cp -r _manual_backup/kb_catalog/schemas/* kb_catalog/schemas/
cp -r _manual_backup/output-styles/schemas/* output-styles/schemas/

Success Metrics

  • βœ… 327 seed files converted to JSONL
  • βœ… 20+ schema files reorganized
  • βœ… 5 JSONL categories created
  • βœ… 2 main directories now hold everything
  • βœ… Zero data loss (all files copied, not moved)
  • βœ… Backup strategy in place
  • βœ… Documentation complete

Support

If you encounter issues:

  1. Check CONSOLIDATION_CLEANUP_GUIDE.md for troubleshooting
  2. Verify files exist in new locations
  3. Review docs/schema.consolidation-complete.md for details
  4. Restore from backup if needed
  5. Re-run scripts/consolidate_seeds.py if JSONL conversion needed

Status: Ready for use βœ…
Risk Level: Low (backups in place, files copied not moved)
Estimated Impact: High (better organization, performance, maintenance)

Platform

Documentation

Community

Support

partnership@altsportsdata.comdev@altsportsleagues.ai

2025 Β© AltSportsLeagues.ai. Powered by AI-driven sports business intelligence.

πŸ€– AI-Enhancedβ€’πŸ“Š Data-Drivenβ€’βš‘ Real-Time