Architecture
🎯 Comprehensive Knowledge Organization Plan

Source: data_layer/docs/COMPREHENSIVE_ORGANIZATION_PLAN.md

🎯 Comprehensive Knowledge Organization Plan

Executive Summary

This plan unifies FOUR interconnected knowledge systems into a coherent, discoverable architecture:

  1. database/prompts/ - Prompt components, builders, and generation
  2. database/storage/ - Runtime operational data (code-based examples module)
  3. database/knowledge/ - Retrieval, embeddings, intent (code modules)
  4. database/kb_catalog/ - Business rules, registries, manifests

Core Philosophy: Organize by LIFECYCLE STAGE (source β†’ runtime β†’ queryable), not by data type.


πŸ—οΈ Proposed Architecture

Level 1: Source of Truth (Version-Controlled Definitions)

database/
β”œβ”€β”€ SOURCE_OF_TRUTH/                    # NEW: Git-tracked, canonical data
β”‚   β”œβ”€β”€ schemas/                        # βœ… Already exists (keep as-is)
β”‚   β”‚   β”œβ”€β”€ canonical/                  # JSON Schema (Draft 2020-12)
β”‚   β”‚   β”œβ”€β”€ generated/                  # Auto-generated adapters
β”‚   β”‚   └── domain/v1/drizzle/          # βœ… Drizzle schemas
β”‚   β”‚
β”‚   β”œβ”€β”€ config/                         # NEW: Business configuration files
β”‚   β”‚   β”œβ”€β”€ business/
β”‚   β”‚   β”‚   β”œβ”€β”€ pricing/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ tier_presets.v1.json          # ← MOVE FROM output-styles/config
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ combat.pricing.v1.json        # ← MOVE FROM output-styles/config
β”‚   β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚   β”œβ”€β”€ scoring/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ scoring_model.v1.json         # ← MOVE FROM output-styles/config
β”‚   β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ sports/                     # NEW: Sport-specific configs
β”‚   β”‚   β”‚   β”œβ”€β”€ archetypes.json
β”‚   β”‚   β”‚   β”œβ”€β”€ betting_markets.json
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   └── README.md                   # Config governance doc
β”‚   β”‚
β”‚   β”œβ”€β”€ prompts/                        # NEW: Static prompt templates
β”‚   β”‚   β”œβ”€β”€ templates/                  # Jinja2/Mustache templates
β”‚   β”‚   β”‚   β”œβ”€β”€ onboarding/
β”‚   β”‚   β”‚   β”œβ”€β”€ classification/
β”‚   β”‚   β”‚   └── contract_generation/
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ components/                 # Reusable prompt fragments
β”‚   β”‚   β”‚   β”œβ”€β”€ system_instructions/
β”‚   β”‚   β”‚   β”œβ”€β”€ few_shot_examples/
β”‚   β”‚   β”‚   └── output_formats/
β”‚   β”‚   β”‚
β”‚   β”‚   └── README.md                   # Template usage guide
β”‚   β”‚
β”‚   └── examples/                       # NEW: Training/reference examples
β”‚       β”œβ”€β”€ onboarding/
β”‚       β”‚   β”œβ”€β”€ questionnaire_extraction/
β”‚       β”‚   β”‚   β”œβ”€β”€ examples.jsonl      # LangMem-ready format
β”‚       β”‚   β”‚   β”œβ”€β”€ metadata.json
β”‚       β”‚   β”‚   └── README.md
β”‚       β”‚   β”‚
β”‚       β”‚   β”œβ”€β”€ tier_classification/
β”‚       β”‚   β”‚   β”œβ”€β”€ examples.jsonl
β”‚       β”‚   β”‚   └── generated_from_config.jsonl  # AUTO-GENERATED
β”‚       β”‚   β”‚
β”‚       β”‚   └── contract_assembly/
β”‚       β”‚       β”œβ”€β”€ examples.jsonl
β”‚       β”‚       └── README.md
β”‚       β”‚
β”‚       β”œβ”€β”€ sports_classification/
β”‚       β”‚   β”œβ”€β”€ by_archetype/
β”‚       β”‚   └── by_market_readiness/
β”‚       β”‚
β”‚       └── README.md                   # Example governance

Level 2: Runtime Services (Operational Layer)

database/
β”œβ”€β”€ knowledge/                          # βœ… Keep as-is (Python modules)
β”‚   β”œβ”€β”€ embeddings/                     # Vector generation
β”‚   β”œβ”€β”€ intent/                         # Query classification
β”‚   β”œβ”€β”€ retrieval/                      # RAG operations
β”‚   β”œβ”€β”€ storage/                        # Vector DB interface
β”‚   └── templates/                      # Dynamic prompt assembly
β”‚
β”œβ”€β”€ storage/                            # βœ… Keep as-is (Python modules)
β”‚   β”œβ”€β”€ examples/                       # Code module for example access
β”‚   β”œβ”€β”€ postgres/                       # PostgreSQL operations
β”‚   β”œβ”€β”€ redis/                          # Cache layer
β”‚   └── supabase/                       # Supabase operations
β”‚
└── prompts/                            # βœ… Enhance (add builders/)
    β”œβ”€β”€ builders/                       # NEW: Prompt construction code
    β”‚   β”œβ”€β”€ onboarding_prompts.py
    β”‚   β”œβ”€β”€ classification_prompts.py
    β”‚   └── contract_prompts.py
    β”‚
    β”œβ”€β”€ registry/                       # Prompt metadata
    └── README.md

Level 3: Business Intelligence (Queryable Layer)

database/
β”œβ”€β”€ kb_catalog/                         # βœ… Keep as-is (enhanced)
β”‚   β”œβ”€β”€ manifests/                      # System inventories
β”‚   β”œβ”€β”€ registry/                       # Component registries
β”‚   β”œβ”€β”€ constants/                      # Enum-like data
β”‚   └── config/                         # Catalog configuration
β”‚
└── output-styles/                      # βœ… Restructure (remove config/)
    β”œβ”€β”€ onboarding/                     # Pipeline stages
    β”‚   β”œβ”€β”€ 02-ingest-validate-questionnaire/
    β”‚   β”œβ”€β”€ 03-enhance-documents/
    β”‚   β”œβ”€β”€ 04-classify/
    β”‚   β”œβ”€β”€ 05-upsert-and-crossref/
    β”‚   β”œβ”€β”€ 06-suggest-tiers-and-terms/
    β”‚   β”‚   β”œβ”€β”€ example_seeds/          # ← Keep (synthetic seeds)
    β”‚   β”‚   β”œβ”€β”€ examples/               # ← Keep (generated outputs)
    β”‚   β”‚   β”œβ”€β”€ generate/               # ← Keep (generation code)
    β”‚   β”‚   β”œβ”€β”€ models/
    β”‚   β”‚   β”œβ”€β”€ schema/
    β”‚   β”‚   └── README.md
    β”‚   β”œβ”€β”€ 07-assemble-contract/
    β”‚   β”œβ”€β”€ 07a-output-contract-export/
    β”‚   β”œβ”€β”€ 07b-output-gamekeeper-scorekeeper-ui/
    β”‚   └── 07c-output-marketing-nxt-onboarding-materials/
    β”‚
    └── README-ORGANIZATION.md

πŸ“¦ Migration Plan

Phase 1: Create New Structure (No Deletions)

# 1. Create SOURCE_OF_TRUTH hierarchy
mkdir -p database/SOURCE_OF_TRUTH/{config,prompts,examples}
mkdir -p database/SOURCE_OF_TRUTH/config/business/{pricing,scoring}
mkdir -p database/SOURCE_OF_TRUTH/config/sports
mkdir -p database/SOURCE_OF_TRUTH/prompts/{templates,components}
mkdir -p database/SOURCE_OF_TRUTH/examples/onboarding/{questionnaire_extraction,tier_classification,contract_assembly}
 
# 2. Copy (don't move yet) config files
cp database/output-styles/config/business/pricing/tier_presets.v1.json \
   database/SOURCE_OF_TRUTH/config/business/pricing/
 
cp database/output-styles/config/business/pricing/combat.pricing.v1.json \
   database/SOURCE_OF_TRUTH/config/business/pricing/
 
cp database/output-styles/config/business/scoring/scoring_model.v1.json \
   database/SOURCE_OF_TRUTH/config/business/scoring/
 
# 3. Create README files
touch database/SOURCE_OF_TRUTH/README.md
touch database/SOURCE_OF_TRUTH/config/README.md
touch database/SOURCE_OF_TRUTH/config/business/README.md
touch database/SOURCE_OF_TRUTH/prompts/README.md
touch database/SOURCE_OF_TRUTH/examples/README.md

Phase 2: Generate Derived Examples

# database/scripts/generate_examples_from_configs.py
"""
Generate training examples from SOURCE_OF_TRUTH configs
Output to SOURCE_OF_TRUTH/examples/ in JSONL format
"""
import json
from pathlib import Path
 
def generate_tier_examples():
    """Convert tier_presets.v1.json into LangMem-ready examples"""
    
    config_path = Path("database/SOURCE_OF_TRUTH/config/business/pricing/tier_presets.v1.json")
    output_path = Path("database/SOURCE_OF_TRUTH/examples/onboarding/tier_classification/generated_from_config.jsonl")
    
    with open(config_path) as f:
        config = json.load(f)
    
    examples = []
    
    for tier_name, tier_data in config['tiers'].items():
        # Example 1: Pricing lookup
        examples.append({
            "input": f"What are the pricing terms for {tier_name}?",
            "output": format_pricing_response(tier_data),
            "metadata": {
                "tier": tier_name,
                "type": "pricing_lookup",
                "source": "tier_presets.v1.json",
                "version": config['version']
            }
        })
        
        # Example 2: Tier recommendation
        if "example_category" in tier_data:
            examples.append({
                "input": f"What tier should I recommend for a {tier_data['example_category']} league?",
                "output": f"Recommend {tier_name} because: {format_justification(tier_data)}",
                "metadata": {
                    "tier": tier_name,
                    "type": "tier_recommendation",
                    "category": tier_data['example_category']
                }
            })
    
    # Write as JSONL
    output_path.parent.mkdir(parents=True, exist_ok=True)
    with open(output_path, 'w') as f:
        for ex in examples:
            f.write(json.dumps(ex) + '\n')
    
    print(f"βœ… Generated {len(examples)} examples to {output_path}")
 
if __name__ == "__main__":
    generate_tier_examples()

Phase 3: Sync to Runtime Systems

# database/scripts/sync_to_runtime_systems.py
"""
Sync SOURCE_OF_TRUTH data to:
- PostgreSQL (JSONB for querying)
- LangMem (vector embeddings for RAG)
- Redis (cache for hot data)
"""
import json
from pathlib import Path
import psycopg2
from langmem import LangMemClient
 
def sync_configs_to_databases():
    """Multi-storage sync strategy"""
    
    # 1. PostgreSQL: Queryable business rules
    pg = psycopg2.connect(DATABASE_URL)
    
    config_files = Path("database/SOURCE_OF_TRUTH/config").rglob("*.json")
    
    for config_file in config_files:
        with open(config_file) as f:
            data = json.load(f)
        
        # Insert as versioned JSONB
        pg.execute("""
            INSERT INTO business_config 
            (config_type, version, file_path, config_data, updated_at)
            VALUES (%s, %s, %s, %s, NOW())
            ON CONFLICT (config_type, version) 
            DO UPDATE SET 
                config_data = EXCLUDED.config_data,
                updated_at = NOW()
        """, (
            config_file.stem,
            data.get('version', 1),
            str(config_file),
            json.dumps(data)
        ))
    
    # 2. LangMem: Semantic search
    langmem = LangMemClient(namespace="business-rules")
    
    example_files = Path("database/SOURCE_OF_TRUTH/examples").rglob("*.jsonl")
    
    for example_file in example_files:
        with open(example_file) as f:
            for line in f:
                example = json.loads(line)
                
                langmem.store(
                    content=f"{example['input']}\n\n{example['output']}",
                    metadata={
                        **example.get('metadata', {}),
                        "source_file": str(example_file)
                    }
                )
    
    print("βœ… Sync complete: PostgreSQL + LangMem")

πŸ”„ Data Flow Diagram

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     SOURCE_OF_TRUTH                          β”‚
β”‚  (Git-tracked, version-controlled, single source)           β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚  β”‚ Schemas β”‚  β”‚ Configs β”‚  β”‚ Prompts β”‚  β”‚Examples β”‚       β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚           β”‚            β”‚            β”‚
        β”‚           └────────┐   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚                    β”‚   β”‚   β”‚
        β–Ό                    β–Ό   β–Ό   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  GENERATION LAYER                            β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  scripts/generate_examples_from_configs.py           β”‚  β”‚
β”‚  β”‚  scripts/sync_to_runtime_systems.py                  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                              β”‚
β”‚  AUTO-GENERATES:                                            β”‚
β”‚  β€’ Training examples (JSONL)                                β”‚
β”‚  β€’ Database inserts (SQL)                                   β”‚
β”‚  β€’ Vector embeddings (LangMem)                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚           β”‚            β”‚            β”‚
        β”‚           β”‚            β”‚            β”‚
        β–Ό           β–Ό            β–Ό            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     RUNTIME LAYER                            β”‚
β”‚  (Queryable, cached, optimized for retrieval)              β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚  β”‚PostgreSQLβ”‚  β”‚ LangMem β”‚  β”‚  Redis  β”‚  β”‚ Supabaseβ”‚       β”‚
β”‚  β”‚  (JSONB) β”‚  β”‚ (Vector)β”‚  β”‚ (Cache) β”‚  β”‚ (Sync)  β”‚       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚           β”‚            β”‚            β”‚
        β”‚           β”‚            β”‚            β”‚
        β–Ό           β–Ό            β–Ό            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  APPLICATION LAYER                           β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚   Prompts    β”‚  β”‚   Knowledge  β”‚  β”‚   Storage    β”‚     β”‚
β”‚  β”‚   Builders   β”‚  β”‚   Retrieval  β”‚  β”‚   Access     β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                                                              β”‚
β”‚  USED BY:                                                    β”‚
β”‚  β€’ FastAPI endpoints                                         β”‚
β”‚  β€’ LangGraph workflows                                       β”‚
β”‚  β€’ MCP servers                                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🎯 Key Decisions & Rationale

Decision 1: Why Move Configs to SOURCE_OF_TRUTH?

Problem: tier_presets.v1.json and scoring_model.v1.json were buried in output-styles/config/business/, making them hard to discover.

Solution:

  • Move to top-level SOURCE_OF_TRUTH/config/
  • Make it clear these are canonical business rules, not pipeline artifacts
  • Enable versioning and change tracking via Git

Decision 2: Why Separate Examples from Config?

Problem: Examples were scattered across pipeline stages, making them difficult to use for RAG/training.

Solution:

  • Store source examples in SOURCE_OF_TRUTH/examples/
  • Store generated outputs in output-styles/onboarding/{stage}/examples/
  • Use generate/ scripts to create training examples from configs

Decision 3: Why Keep Prompts Separate?

Problem: Prompt components, templates, and builders were in different places (database/prompts/, code modules).

Solution:

  • Static templates β†’ SOURCE_OF_TRUTH/prompts/templates/
  • Prompt builders (code) β†’ database/prompts/builders/
  • Dynamic assembly β†’ database/knowledge/templates/

Decision 4: How Do We Avoid Duplication?

Strategy: Single Source of Truth + Generation Scripts

# One config file generates many artifacts:
 
tier_presets.v1.json (SOURCE_OF_TRUTH)
    ↓
    β”œβ”€β†’ Training examples (JSONL)
    β”œβ”€β†’ PostgreSQL rows (JSONB)
    β”œβ”€β†’ LangMem embeddings
    β”œβ”€β†’ Redis cache entries
    └─→ API response templates

πŸ“ README Templates

SOURCE_OF_TRUTH/README.md

# Source of Truth
 
This directory contains all **canonical, version-controlled data** for the AltSports system.
 
## Philosophy
 
**Single Source of Truth**: All derived data (database records, embeddings, cache entries) is GENERATED from files here.
 
## Structure
 
- **`schemas/`**: JSON Schema definitions (already exists)
- **`config/`**: Business configuration files (pricing, scoring, sports)
- **`prompts/`**: Static prompt templates (Jinja2/Mustache)
- **`examples/`**: Training and reference examples (JSONL format)
 
## Usage
 
1. **Edit files here** (version-controlled)
2. **Run generation scripts** to sync to runtime systems:
   ```bash
   python database/scripts/generate_examples_from_configs.py
   python database/scripts/sync_to_runtime_systems.py
  1. Verify sync in PostgreSQL, LangMem, Redis

⚠️ Important

  • Never edit data in runtime systems directly
  • Always update SOURCE_OF_TRUTH first
  • Always run sync scripts after changes

### SOURCE_OF_TRUTH/config/README.md

```markdown
# Business Configuration Files

Canonical configuration for pricing, scoring, and sports logic.

## Files

### Pricing
- **`tier_presets.v1.json`**: Pricing tiers, SLAs, contract templates
- **`combat.pricing.v1.json`**: Combat sports vertical pricing

### Scoring
- **`scoring_model.v1.json`**: Scoring weights, modifiers, thresholds

### Sports
- **`archetypes.json`**: Sport classification rules
- **`betting_markets.json`**: Market definitions by sport

## Generation

These configs auto-generate:
- Training examples β†’ `SOURCE_OF_TRUTH/examples/`
- Database records β†’ PostgreSQL `business_config` table
- Vector embeddings β†’ LangMem `business-rules` namespace

Run:
```bash
python database/scripts/sync_to_runtime_systems.py

### SOURCE_OF_TRUTH/examples/README.md

```markdown
# Training & Reference Examples

JSONL-formatted examples for LLM training, RAG, and testing.

## Format

All examples follow this structure:
```json
{
  "input": "User query or task description",
  "output": "Expected response or result",
  "metadata": {
    "type": "example_type",
    "source": "originating_config_file",
    "version": 1
  }
}

Categories

  • onboarding/: Questionnaire processing examples
  • sports_classification/: Sport archetype examples
  • contract_generation/: Contract assembly examples

Auto-Generated Examples

Files ending in generated_from_config.jsonl are AUTO-GENERATED:

python database/scripts/generate_examples_from_configs.py

Do not edit these manually. Edit the source config instead.


---

## πŸš€ Implementation Checklist

### Week 1: Foundation
- [ ] Create `SOURCE_OF_TRUTH/` directory structure
- [ ] Write `SOURCE_OF_TRUTH/README.md`
- [ ] Copy (don't move) config files from `output-styles/config/`
- [ ] Create `database/scripts/generate_examples_from_configs.py`
- [ ] Test example generation locally

### Week 2: Sync Infrastructure
- [ ] Create PostgreSQL `business_config` table
- [ ] Write `database/scripts/sync_to_runtime_systems.py`
- [ ] Test PostgreSQL sync
- [ ] Test LangMem sync
- [ ] Add Redis caching layer

### Week 3: Prompt Organization
- [ ] Move static templates to `SOURCE_OF_TRUTH/prompts/templates/`
- [ ] Create `database/prompts/builders/` for code modules
- [ ] Update imports in existing code
- [ ] Test prompt building

### Week 4: Integration & Testing
- [ ] Update FastAPI endpoints to use new paths
- [ ] Update LangGraph workflows
- [ ] Update MCP servers
- [ ] Run full system test
- [ ] Document migration in CHANGELOG

### Week 5: Cleanup
- [ ] Remove old `output-styles/config/` (after verification)
- [ ] Update all README files
- [ ] Create developer documentation
- [ ] Train team on new structure

---

## πŸŽ“ Developer Guide

### How to Add a New Business Config

1. **Create config in SOURCE_OF_TRUTH**:
   ```bash
   vim database/SOURCE_OF_TRUTH/config/business/my_new_config.v1.json
  1. Update generation script:

    # database/scripts/generate_examples_from_configs.py
    def generate_my_new_examples():
        # Add logic here
        pass
  2. Run sync:

    python database/scripts/sync_to_runtime_systems.py
  3. Verify:

    SELECT * FROM business_config WHERE config_type = 'my_new_config';

How to Use Examples in RAG

from langmem import LangMemClient
 
# Query examples
client = LangMemClient(namespace="business-rules")
results = client.query(
    query="What tier for a combat league?",
    filters={"type": "tier_recommendation"}
)
 
for result in results:
    print(f"Example: {result.content}")
    print(f"Metadata: {result.metadata}")

How to Build Prompts

from database.prompts.builders import onboarding_prompts
 
# Use prompt builder
prompt = onboarding_prompts.build_tier_classification_prompt(
    league_data=league_json,
    config=tier_presets,
    examples=langmem_examples
)

πŸ“Š Success Metrics

After implementation, we should achieve:

  1. Discoverability: Developers find config files in < 30 seconds
  2. Consistency: Zero manual config updates to runtime systems
  3. Versioning: 100% of config changes tracked in Git
  4. RAG Quality: 30% improvement in tier recommendation accuracy
  5. Maintainability: 50% reduction in "where is X?" questions

πŸ”— Related Documentation


Last Updated: 2025-01-16 Next Review: After Phase 2 completion

Platform

Documentation

Community

Support

partnership@altsportsdata.comdev@altsportsleagues.ai

2025 Β© AltSportsLeagues.ai. Powered by AI-driven sports business intelligence.

πŸ€– AI-Enhancedβ€’πŸ“Š Data-Drivenβ€’βš‘ Real-Time