Architecture
Data Fabric Architecture

Source: data_layer/docs/DATA_FABRIC_ARCHITECTURE.md

🌐 Data Fabric Architecture - Complete System Design

Version: 2.0
Date: 2025-10-16
Status: Production Architecture

Purpose: Unified, intelligent data architecture supporting multi-storage retrieval, schema-driven validation, prompt composition, and AI-powered generation pipelines.


🎯 Core Concept: The Complete Data Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     DEFINITIONS (Source of Truth)                     β”‚
β”‚  Git-tracked, version-controlled, single source of truth             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚Schemas β”‚  β”‚Configs β”‚  β”‚Prompts β”‚  β”‚Examplesβ”‚  β”‚  Seeds β”‚        β”‚
β”‚  β”‚(shape) β”‚  β”‚(values)β”‚  β”‚(instr) β”‚  β”‚(train) β”‚  β”‚(synth) β”‚        β”‚
β”‚  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜        β”‚
β”‚      β”‚           β”‚            β”‚            β”‚            β”‚             β”‚
β””β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚           β”‚            β”‚            β”‚            β”‚
       β”‚     β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚            β”‚
       β”‚     β”‚                            β”‚   β”‚            β”‚
       β–Ό     β–Ό                            β–Ό   β–Ό            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        WEAVE (Transformation)                         β”‚
β”‚  Python modules that BUILD, COMPOSE, EMBED, GENERATE                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚  β”‚   Builders   β”‚  β”‚  Generators  β”‚  β”‚  Embedders   β”‚              β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€              β”‚
β”‚  β”‚β€’ Prompt      β”‚  β”‚β€’ Examples    β”‚  β”‚β€’ Vector      β”‚              β”‚
β”‚  β”‚  Composer    β”‚  β”‚  from Config β”‚  β”‚  Embeddings  β”‚              β”‚
β”‚  β”‚β€’ Schema      β”‚  β”‚β€’ Pydantic    β”‚  β”‚β€’ Semantic    β”‚              β”‚
β”‚  β”‚  Generator   β”‚  β”‚  from JSON   β”‚  β”‚  Index       β”‚              β”‚
β”‚  β”‚β€’ Config      β”‚  β”‚β€’ TypeScript  β”‚  β”‚β€’ LangMem     β”‚              β”‚
β”‚  β”‚  Loader      β”‚  β”‚  from JSON   β”‚  β”‚  Sync        β”‚              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚         β”‚                  β”‚                  β”‚                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                  β”‚                  β”‚
          β–Ό                  β–Ό                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         VIEWS (Materialized)                          β”‚
β”‚  Multi-storage, optimized for specific access patterns               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚PostgreSQLβ”‚  β”‚ LangMem  β”‚  β”‚  Redis   β”‚  β”‚Supabase  β”‚           β”‚
β”‚  β”‚  (JSONB) β”‚  β”‚ (Vector) β”‚  β”‚ (Cache)  β”‚  β”‚  (Auth)  β”‚           β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€           β”‚
β”‚  β”‚β€’ Query   β”‚  β”‚β€’ RAG     β”‚  β”‚β€’ Hot     β”‚  β”‚β€’ User    β”‚           β”‚
β”‚  β”‚β€’ Join    β”‚  β”‚β€’ Semanticβ”‚  β”‚  Configs β”‚  β”‚  State   β”‚           β”‚
β”‚  β”‚β€’ Version β”‚  β”‚  Search  β”‚  β”‚β€’ Session β”‚  β”‚β€’ Realtimeβ”‚           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜           β”‚
β”‚         β”‚              β”‚              β”‚              β”‚                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚              β”‚              β”‚              β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        APPLICATION (Consumers)                        β”‚
β”‚  FastAPI + LangGraph + MCP Servers + Next.js Frontend                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚  LLM Pipeline  β”‚  β”‚  Validation    β”‚  β”‚   Frontend     β”‚        β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€        β”‚
β”‚  β”‚β€’ Prompt with   β”‚  β”‚β€’ Pydantic      β”‚  β”‚β€’ Zod           β”‚        β”‚
β”‚  β”‚  embedded ex.  β”‚  β”‚  validates     β”‚  β”‚  validates     β”‚        β”‚
β”‚  β”‚β€’ Generate with β”‚  β”‚  backend       β”‚  β”‚  frontend      β”‚        β”‚
β”‚  β”‚  constraints   β”‚  β”‚β€’ Enforce       β”‚  β”‚β€’ TypeScript    β”‚        β”‚
β”‚  β”‚β€’ Retrieve      β”‚  β”‚  schema        β”‚  β”‚  types         β”‚        β”‚
β”‚  β”‚  semantically  β”‚  β”‚β€’ Return JSON   β”‚  β”‚β€’ UI safety     β”‚        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚                                                                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Complete Directory Structure

data_layer/                                    # The unified data fabric
β”‚
β”œβ”€β”€ README.md                                  # This file
β”œβ”€β”€ DATA_FABRIC_ARCHITECTURE.md                # Architecture overview
β”‚
β”œβ”€β”€ definitions/                               # πŸ”· TIER 1: Source of Truth
β”‚   β”‚                                          # Git-tracked, canonical definitions
β”‚   β”œβ”€β”€ schemas/                               # JSON Schema (canonical)
β”‚   β”‚   β”œβ”€β”€ canonical/                         # Draft 2020-12 JSON Schema
β”‚   β”‚   β”‚   β”œβ”€β”€ contract-terms.schema.json
β”‚   β”‚   β”‚   β”œβ”€β”€ questionnaire.schema.json
β”‚   β”‚   β”‚   β”œβ”€β”€ tier-classification.schema.json
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ generated/                         # AUTO-GENERATED from canonical
β”‚   β”‚   β”‚   β”œβ”€β”€ pydantic/                      # Python validation
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ contract_terms.py
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ questionnaire.py
β”‚   β”‚   β”‚   β”‚   └── __init__.py
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   β”œβ”€β”€ typescript/                    # Frontend types
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ contract-terms.ts
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ questionnaire.ts
β”‚   β”‚   β”‚   β”‚   └── index.ts
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   β”œβ”€β”€ zod/                           # Frontend validation
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ contract-terms.zod.ts
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ questionnaire.zod.ts
β”‚   β”‚   β”‚   β”‚   └── index.ts
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   └── drizzle/                       # ORM schemas
β”‚   β”‚   β”‚       β”œβ”€β”€ contract-terms.ts
β”‚   β”‚   β”‚       β”œβ”€β”€ questionnaire.ts
β”‚   β”‚   β”‚       └── index.ts
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ generate_all.py                    # Master generator script
β”‚   β”‚   └── README.md                          # Schema governance
β”‚   β”‚
β”‚   β”œβ”€β”€ config/                                # Business configuration
β”‚   β”‚   β”œβ”€β”€ business/
β”‚   β”‚   β”‚   β”œβ”€β”€ pricing/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ tier_presets.v1.json      # Tier pricing & terms
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ combat.pricing.v1.json    # Combat vertical pricing
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ standard.pricing.v1.json  # Standard pricing
β”‚   β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   β”œβ”€β”€ scoring/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ scoring_model.v1.json     # Scoring weights & thresholds
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ tier_thresholds.v1.json
β”‚   β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   β”œβ”€β”€ rules/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ validation_rules.json
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ business_logic.json
β”‚   β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ sports/
β”‚   β”‚   β”‚   β”œβ”€β”€ archetypes.json               # Sport classifications
β”‚   β”‚   β”‚   β”œβ”€β”€ betting_markets.json          # Market definitions
β”‚   β”‚   β”‚   β”œβ”€β”€ stat_mappings.json            # Sport-specific stats
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ workflows/
β”‚   β”‚   β”‚   β”œβ”€β”€ onboarding.config.json
β”‚   β”‚   β”‚   β”œβ”€β”€ contract_generation.config.json
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   └── README.md                          # Config governance
β”‚   β”‚
β”‚   β”œβ”€β”€ prompts/                               # Static prompt definitions
β”‚   β”‚   β”œβ”€β”€ templates/                         # Jinja2/Mustache templates
β”‚   β”‚   β”‚   β”œβ”€β”€ onboarding/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ questionnaire_extraction.j2
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ enhancement.j2
β”‚   β”‚   β”‚   β”‚   └── classification.j2
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   β”œβ”€β”€ contract/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ tier_1_template.j2
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ tier_2_template.j2
β”‚   β”‚   β”‚   β”‚   └── variable_sections.j2
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ components/                        # Reusable prompt blocks
β”‚   β”‚   β”‚   β”œβ”€β”€ system_instructions/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ base_agent.md
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ tier_classifier.md
β”‚   β”‚   β”‚   β”‚   └── contract_assembler.md
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   β”œβ”€β”€ few_shot_patterns/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ classification_pattern.md
β”‚   β”‚   β”‚   β”‚   └── extraction_pattern.md
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   β”œβ”€β”€ output_formats/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ json_structure.md
β”‚   β”‚   β”‚   β”‚   └── markdown_contract.md
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   └── README.md                          # Prompt template guide
β”‚   β”‚
β”‚   β”œβ”€β”€ examples/                              # Training & reference data
β”‚   β”‚   β”œβ”€β”€ seeds/                             # Hand-curated golden examples
β”‚   β”‚   β”‚   β”œβ”€β”€ onboarding/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ questionnaire-extraction.jsonl
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ enhancement.jsonl
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ classification.jsonl
β”‚   β”‚   β”‚   β”‚   └── tier-suggestion.jsonl
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   β”œβ”€β”€ contract-generation/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ tier-1-examples.jsonl
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ tier-2-examples.jsonl
β”‚   β”‚   β”‚   β”‚   └── combat-examples.jsonl
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ generated/                         # AUTO-GENERATED from configs
β”‚   β”‚   β”‚   β”œβ”€β”€ pricing-examples.jsonl        # From tier_presets
β”‚   β”‚   β”‚   β”œβ”€β”€ scoring-examples.jsonl        # From scoring_model
β”‚   β”‚   β”‚   β”œβ”€β”€ sport-classification.jsonl    # From archetypes
β”‚   β”‚   β”‚   └── README.md                      # Generation docs
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ validation/                        # Edge cases & tests
β”‚   β”‚   β”‚   β”œβ”€β”€ edge-cases.jsonl
β”‚   β”‚   β”‚   β”œβ”€β”€ negative-examples.jsonl
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   └── README.md                          # Example governance
β”‚   β”‚
β”‚   └── kb_catalog/                            # Business intelligence
β”‚       β”œβ”€β”€ constants/                         # Python constants
β”‚       β”‚   β”œβ”€β”€ __init__.py
β”‚       β”‚   β”œβ”€β”€ business_rules.py              # Importable rules
β”‚       β”‚   β”œβ”€β”€ sport_classifications.py
β”‚       β”‚   β”œβ”€β”€ field_mappings.py
β”‚       β”‚   └── validation_rules.py
β”‚       β”‚
β”‚       β”œβ”€β”€ registry/                          # Manual registries
β”‚       β”‚   β”œβ”€β”€ core_schemas_registry.json
β”‚       β”‚   β”œβ”€β”€ workflow_registry.json
β”‚       β”‚   └── triage_rules.json
β”‚       β”‚
β”‚       β”œβ”€β”€ manifests/                         # Auto-generated catalogs
β”‚       β”‚   β”œβ”€β”€ agents.json                    # System agent inventory
β”‚       β”‚   β”œβ”€β”€ tools.json                     # MCP tools catalog
β”‚       β”‚   └── services.json                  # Service registry
β”‚       β”‚
β”‚       └── README.md                          # KB catalog guide
β”‚
β”œβ”€β”€ weave/                                     # πŸ”Ά TIER 2: Transformation
β”‚   β”‚                                          # Python code for integration
β”‚   β”œβ”€β”€ builders/                              # Composition engines
β”‚   β”‚   β”œβ”€β”€ prompts/
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ base_builder.py               # Base prompt builder
β”‚   β”‚   β”‚   β”œβ”€β”€ onboarding_builder.py         # Builds onboarding prompts
β”‚   β”‚   β”‚   β”œβ”€β”€ classification_builder.py     # Builds classification prompts
β”‚   β”‚   β”‚   β”œβ”€β”€ contract_builder.py           # Builds contract prompts
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ schemas/
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ pydantic_generator.py         # JSON β†’ Pydantic
β”‚   β”‚   β”‚   β”œβ”€β”€ typescript_generator.py       # JSON β†’ TypeScript
β”‚   β”‚   β”‚   β”œβ”€β”€ zod_generator.py              # JSON β†’ Zod
β”‚   β”‚   β”‚   β”œβ”€β”€ drizzle_generator.py          # JSON β†’ Drizzle
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   └── examples/
β”‚   β”‚       β”œβ”€β”€ __init__.py
β”‚   β”‚       β”œβ”€β”€ config_to_examples.py         # Config β†’ Training examples
β”‚   β”‚       β”œβ”€β”€ synthetic_generator.py        # Synthetic data generation
β”‚   β”‚       └── README.md
β”‚   β”‚
β”‚   β”œβ”€β”€ embedders/                             # Vector generation
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ prompt_embedder.py                # Embed prompts for retrieval
β”‚   β”‚   β”œβ”€β”€ example_embedder.py               # Embed examples for RAG
β”‚   β”‚   β”œβ”€β”€ config_embedder.py                # Embed configs as knowledge
β”‚   β”‚   └── README.md
β”‚   β”‚
β”‚   β”œβ”€β”€ retrievers/                            # Intelligent retrieval
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ prompt_retriever.py               # Retrieve similar prompts
β”‚   β”‚   β”œβ”€β”€ example_retriever.py              # Retrieve relevant examples
β”‚   β”‚   β”œβ”€β”€ semantic_matcher.py               # Semantic similarity
β”‚   β”‚   └── README.md
β”‚   β”‚
β”‚   β”œβ”€β”€ knowledge/                             # Intelligence layer
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ intent/                            # Intent classification
β”‚   β”‚   β”‚   β”œβ”€β”€ classifier.py
β”‚   β”‚   β”‚   └── router.py
β”‚   β”‚   β”œβ”€β”€ retrieval/                         # RAG operations
β”‚   β”‚   β”‚   β”œβ”€β”€ rag_engine.py
β”‚   β”‚   β”‚   └── context_builder.py
β”‚   β”‚   └── templates/                         # Dynamic templates
β”‚   β”‚       β”œβ”€β”€ template_engine.py
β”‚   β”‚       └── variable_injector.py
β”‚   β”‚
β”‚   β”œβ”€β”€ storage/                               # Multi-storage abstraction
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ postgres_client.py                # PostgreSQL operations
β”‚   β”‚   β”œβ”€β”€ langmem_client.py                 # LangMem operations
β”‚   β”‚   β”œβ”€β”€ redis_client.py                   # Redis operations
β”‚   β”‚   β”œβ”€β”€ supabase_client.py                # Supabase operations
β”‚   β”‚   └── README.md
β”‚   β”‚
β”‚   └── README.md                              # Weave layer guide
β”‚
β”œβ”€β”€ views/                                     # πŸ”Έ TIER 3: Materialized
β”‚   β”‚                                          # Generated outputs, queryable
β”‚   β”œβ”€β”€ prompts/                               # Generated final prompts
β”‚   β”‚   β”œβ”€β”€ agents/
β”‚   β”‚   β”‚   β”œβ”€β”€ tier-classifier.v2.md         # AUTO-GENERATED
β”‚   β”‚   β”‚   β”œβ”€β”€ contract-assembler.v3.md
β”‚   β”‚   β”‚   └── questionnaire-extractor.v1.md
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ workflows/
β”‚   β”‚   β”‚   β”œβ”€β”€ onboarding-workflow.v1.md
β”‚   β”‚   β”‚   └── contract-generation.v2.md
β”‚   β”‚   β”‚
β”‚   β”‚   └── README.md                          # Usage: Don't edit!
β”‚   β”‚
β”‚   β”œβ”€β”€ onboarding/                            # Pipeline materialized views
β”‚   β”‚   β”œβ”€β”€ 02-ingest-validate/
β”‚   β”‚   β”‚   β”œβ”€β”€ outputs/                       # Generated outputs
β”‚   β”‚   β”‚   β”œβ”€β”€ cache/                         # Processed cache
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ 06-suggest-tiers/
β”‚   β”‚   β”‚   β”œβ”€β”€ outputs/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ tier-suggestions.json
β”‚   β”‚   β”‚   β”‚   └── scoring-results.json
β”‚   β”‚   β”‚   └── README.md
β”‚   β”‚   β”‚
β”‚   β”‚   └── 07-assemble-contract/
β”‚   β”‚       β”œβ”€β”€ outputs/
β”‚   β”‚       β”‚   β”œβ”€β”€ contracts/                 # Generated PDFs
β”‚   β”‚       β”‚   └── markdown/                  # Markdown versions
β”‚   β”‚       └── README.md
β”‚   β”‚
β”‚   β”œβ”€β”€ embeddings/                            # Vector stores (runtime)
β”‚   β”‚   β”œβ”€β”€ prompt_vectors/                    # Embedded prompts
β”‚   β”‚   β”œβ”€β”€ example_vectors/                   # Embedded examples
β”‚   β”‚   β”œβ”€β”€ config_vectors/                    # Embedded configs
β”‚   β”‚   └── README.md
β”‚   β”‚
β”‚   └── README.md                              # Views layer guide
β”‚
β”œβ”€β”€ scripts/                                   # πŸ› οΈ Orchestration scripts
β”‚   β”œβ”€β”€ sync/
β”‚   β”‚   β”œβ”€β”€ sync_to_postgresql.py             # Config β†’ PostgreSQL JSONB
β”‚   β”‚   β”œβ”€β”€ sync_to_langmem.py                # Examples β†’ LangMem vectors
β”‚   β”‚   β”œβ”€β”€ sync_to_redis.py                  # Hot configs β†’ Redis cache
β”‚   β”‚   └── sync_all.py                        # Master sync script
β”‚   β”‚
β”‚   β”œβ”€β”€ generate/
β”‚   β”‚   β”œβ”€β”€ generate_schemas.py               # JSON β†’ Pydantic/TS/Zod/Drizzle
β”‚   β”‚   β”œβ”€β”€ generate_examples.py              # Config β†’ Training examples
β”‚   β”‚   β”œβ”€β”€ generate_prompts.py               # Components β†’ Final prompts
β”‚   β”‚   └── generate_all.py                    # Master generation script
β”‚   β”‚
β”‚   β”œβ”€β”€ embed/
β”‚   β”‚   β”œβ”€β”€ embed_prompts.py                  # Prompts β†’ Vectors
β”‚   β”‚   β”œβ”€β”€ embed_examples.py                 # Examples β†’ Vectors
β”‚   β”‚   β”œβ”€β”€ embed_configs.py                  # Configs β†’ Vectors
β”‚   β”‚   └── embed_all.py                       # Master embedding script
β”‚   β”‚
β”‚   └── README.md                              # Scripts usage guide
β”‚
β”œβ”€β”€ tests/                                     # Testing infrastructure
β”‚   β”œβ”€β”€ test_builders.py                       # Test prompt/schema builders
β”‚   β”œβ”€β”€ test_generators.py                    # Test example generation
β”‚   β”œβ”€β”€ test_embeddings.py                    # Test vector operations
β”‚   β”œβ”€β”€ test_retrieval.py                     # Test RAG pipeline
β”‚   └── README.md
β”‚
└── docs/                                      # Documentation
    β”œβ”€β”€ ARCHITECTURE.md                        # This file (symlink)
    β”œβ”€β”€ QUICK_START.md                         # Developer onboarding
    β”œβ”€β”€ API_REFERENCE.md                       # Code API docs
    └── WORKFLOWS.md                           # Common workflows

πŸ”„ The Complete Data Flow (Your Vision Realized)

Flow 1: Schema-Driven Validation Pipeline

# 1. CANONICAL SCHEMA (definitions/schemas/canonical/)
# contract-terms.schema.json (JSON Schema Draft 2020-12)
 
# 2. GENERATE VALIDATORS (weave/builders/schemas/)
python weave/builders/schemas/generate_all.py
# β†’ Creates Pydantic, TypeScript, Zod, Drizzle
 
# 3. BACKEND VALIDATION (Application Layer)
from data_layer.definitions.schemas.generated.pydantic import ContractTerms
 
contract = ContractTerms(**llm_output)  # Pydantic validates
 
# 4. FRONTEND VALIDATION (Application Layer)
import { contractTermsSchema } from '@/data_layer/definitions/schemas/generated/zod'
 
const validated = contractTermsSchema.parse(apiResponse)  // Zod validates

Flow 2: Config-Driven Example Generation

# 1. BUSINESS CONFIG (definitions/config/business/)
# tier_presets.v1.json contains actual pricing values
 
# 2. GENERATE EXAMPLES (weave/builders/examples/)
from weave.builders.examples import config_to_examples
 
examples = config_to_examples(
    config_path="definitions/config/business/pricing/tier_presets.v1.json",
    output_path="definitions/examples/generated/pricing-examples.jsonl"
)
# Creates 50+ training examples in JSONL format
 
# 3. EMBED EXAMPLES (weave/embedders/)
from weave.embedders import example_embedder
 
example_embedder.embed_all(
    input_path="definitions/examples/generated/pricing-examples.jsonl",
    namespace="pricing-examples"
)
# Stores in LangMem for RAG retrieval
 
# 4. RETRIEVE IN CONTEXT (Application Layer)
from weave.retrievers import example_retriever
 
relevant_examples = example_retriever.get_similar(
    query="What tier for a combat league with $2M revenue?",
    namespace="pricing-examples",
    k=5
)
# Returns 5 most relevant examples for few-shot prompting

Flow 3: Prompt Component Composition

# 1. PROMPT COMPONENTS (definitions/prompts/components/)
# system_instructions/tier_classifier.md
# few_shot_patterns/classification_pattern.md
# output_formats/json_structure.md
 
# 2. LOAD BUSINESS CONFIG (definitions/config/)
from data_layer.definitions.config.business import load_config
 
scoring_weights = load_config("business/scoring/scoring_model.v1.json")
 
# 3. BUILD DYNAMIC PROMPT (weave/builders/prompts/)
from weave.builders.prompts import classification_builder
 
prompt = classification_builder.build(
    components=[
        "system_instructions/tier_classifier.md",
        "few_shot_patterns/classification_pattern.md"
    ],
    config=scoring_weights,  # Inject actual weights
    examples=relevant_examples  # From retrieval
)
 
# 4. EMBED FOR FUTURE RETRIEVAL (weave/embedders/)
from weave.embedders import prompt_embedder
 
prompt_embedder.embed(
    prompt_text=prompt,
    metadata={
        "type": "classification",
        "version": "2.0",
        "config_version": scoring_weights['version']
    }
)
 
# 5. RETRIEVE SIMILAR PROMPTS LATER
from weave.retrievers import prompt_retriever
 
similar_prompts = prompt_retriever.get_similar(
    query="Need to classify a new league type",
    k=3
)
# Returns 3 most similar historical prompts for reference

Flow 4: Multi-Storage Retrieval Strategy

# APPLICATION NEEDS: Get tier recommendation with reasoning
 
# 1. RETRIEVE FROM REDIS (Hot Cache)
from weave.storage import redis_client
 
cached_tier = redis_client.get(f"tier:league:{league_id}")
if cached_tier:
    return cached_tier  # Fast path: < 5ms
 
# 2. RETRIEVE FROM POSTGRESQL (Structured Query)
from weave.storage import postgres_client
 
tier_config = postgres_client.query("""
    SELECT config_data->'tiers'->'tier_1' as tier_1
    FROM business_config
    WHERE config_type = 'tier_presets' AND version = 1
""")
 
# 3. RETRIEVE FROM LANGMEM (Semantic Search)
from weave.storage import langmem_client
 
relevant_examples = langmem_client.query(
    query=f"Tier recommendation for {league_characteristics}",
    namespace="pricing-examples",
    filters={"type": "tier_recommendation"},
    k=5
)
 
# 4. COMPOSE FINAL PROMPT WITH ALL CONTEXT
from weave.builders.prompts import classification_builder
 
final_prompt = classification_builder.build(
    system_instructions="tier_classifier.md",
    business_config=tier_config,  # From PostgreSQL
    few_shot_examples=relevant_examples,  # From LangMem
    output_schema=tier_classification_schema  # From definitions/schemas/
)
 
# 5. LLM GENERATES with Pydantic Validation
from langchain_openai import ChatOpenAI
from data_layer.definitions.schemas.generated.pydantic import TierClassification
 
llm = ChatOpenAI(model="gpt-4")
structured_llm = llm.with_structured_output(TierClassification)
 
result = structured_llm.invoke(final_prompt)
# Returns validated Pydantic model
 
# 6. CACHE RESULT
redis_client.set(
    f"tier:league:{league_id}",
    result.model_dump_json(),
    ex=3600  # 1 hour TTL
)
 
# 7. SEND TO FRONTEND (Zod validates there)
# Frontend receives JSON, validates with Zod schema

🎨 Key Design Patterns

Pattern 1: Single Source, Multiple Views

tier_presets.v1.json (SINGLE SOURCE)
    β”‚
    β”œβ”€β†’ PostgreSQL JSONB (queryable)
    β”œβ”€β†’ LangMem vectors (semantic)
    β”œβ”€β†’ Redis JSON (cached)
    β”œβ”€β†’ Training examples JSONL (few-shot)
    └─→ API response templates (runtime)

Benefit: Update once, propagates everywhere


Pattern 2: Schema-Driven Everything

contract-terms.schema.json (CANONICAL)
    β”‚
    β”œβ”€β†’ Pydantic model (backend validation)
    β”œβ”€β†’ TypeScript types (frontend types)
    β”œβ”€β†’ Zod schema (frontend validation)
    β”œβ”€β†’ Drizzle schema (database ORM)
    └─→ Documentation (auto-generated)

Benefit: Type safety across entire stack


Pattern 3: Component-Based Prompt Assembly

# Components (small, reusable)
system_instruction = load("system_instructions/tier_classifier.md")
few_shot_pattern = load("few_shot_patterns/classification.md")
output_format = load("output_formats/json_structure.md")
 
# Config (actual values)
weights = load_config("business/scoring/scoring_model.v1.json")
 
# Examples (context)
examples = retrieve_examples(
    query="tier classification",
    k=5
)
 
# BUILD final prompt
final_prompt = compose(
    system_instruction,
    inject_weights(few_shot_pattern, weights),
    inject_examples(few_shot_pattern, examples),
    output_format
)

Benefit: Prompts are dynamic, data-driven, testable


Pattern 4: Embedded Retrieval Everywhere

# Everything can be retrieved semantically:
 
# 1. Retrieve similar prompts
similar_prompts = retrieve_prompts(
    "How to classify combat sports?",
    namespace="prompts"
)
 
# 2. Retrieve relevant examples
relevant_examples = retrieve_examples(
    "Tier 1 combat league pricing",
    namespace="examples"
)
 
# 3. Retrieve business rules
business_context = retrieve_configs(
    "Combat sports pricing rules",
    namespace="business-rules"
)
 
# 4. Compose everything into final prompt
final_prompt = compose_with_retrieval(
    query="Classify new MMA league",
    prompt_template=similar_prompts[0],
    examples=relevant_examples[:5],
    config=business_context
)

Benefit: AI has intelligent access to all knowledge


πŸš€ Implementation Scripts

Master Sync Script

# scripts/sync/sync_all.py
"""
Master orchestration: SOURCE_OF_TRUTH β†’ RUNTIME SYSTEMS
"""
import asyncio
from pathlib import Path
from weave.storage import postgres_client, langmem_client, redis_client
 
async def sync_all():
    """Sync everything from definitions/ to runtime"""
    
    print("πŸ”„ Starting multi-storage sync...")
    
    # 1. Sync configs to PostgreSQL (JSONB)
    print("  πŸ“Š Syncing to PostgreSQL...")
    await postgres_client.sync_configs(
        source_dir=Path("data_layer/definitions/config")
    )
    
    # 2. Sync examples to LangMem (vectors)
    print("  🧠 Syncing to LangMem...")
    await langmem_client.sync_examples(
        source_dir=Path("data_layer/definitions/examples")
    )
    
    # 3. Cache hot configs in Redis
    print("  ⚑ Caching in Redis...")
    await redis_client.cache_hot_configs(
        configs=["tier_presets.v1", "scoring_model.v1"]
    )
    
    # 4. Embed prompts for retrieval
    print("  πŸ” Embedding prompts...")
    from weave.embedders import prompt_embedder
    await prompt_embedder.embed_all(
        source_dir=Path("data_layer/views/prompts")
    )
    
    print("βœ… Sync complete!")
 
if __name__ == "__main__":
    asyncio.run(sync_all())

Master Generation Script

# scripts/generate/generate_all.py
"""
Generate all derived artifacts from SOURCE_OF_TRUTH
"""
from pathlib import Path
from weave.builders import schemas, examples, prompts
 
def generate_all():
    """Generate schemas, examples, and prompts"""
    
    print("πŸ—οΈ  Generating all artifacts...")
    
    # 1. Generate schema adapters
    print("  πŸ“‹ Generating schemas...")
    schemas.pydantic_generator.generate_all(
        source=Path("data_layer/definitions/schemas/canonical"),
        output=Path("data_layer/definitions/schemas/generated/pydantic")
    )
    schemas.zod_generator.generate_all(
        source=Path("data_layer/definitions/schemas/canonical"),
        output=Path("data_layer/definitions/schemas/generated/zod")
    )
    schemas.drizzle_generator.generate_all(
        source=Path("data_layer/definitions/schemas/canonical"),
        output=Path("data_layer/definitions/schemas/generated/drizzle")
    )
    
    # 2. Generate examples from configs
    print("  🎯 Generating examples...")
    examples.config_to_examples.generate_from_configs(
        config_dir=Path("data_layer/definitions/config/business"),
        output_dir=Path("data_layer/definitions/examples/generated")
    )
    
    # 3. Build final prompts from components
    print("  πŸ“ Building prompts...")
    prompts.onboarding_builder.build_all(
        components_dir=Path("data_layer/definitions/prompts/components"),
        config_dir=Path("data_layer/definitions/config"),
        output_dir=Path("data_layer/views/prompts")
    )
    
    print("βœ… Generation complete!")
 
if __name__ == "__main__":
    generate_all()

πŸ§ͺ Usage Examples

Example 1: Complete LLM Pipeline with Validation

from langchain_openai import ChatOpenAI
from data_layer.weave.builders.prompts import classification_builder
from data_layer.weave.retrievers import example_retriever
from data_layer.definitions.schemas.generated.pydantic import TierClassification
 
async def classify_league(league_data: dict) -> TierClassification:
    """
    Complete classification pipeline:
    1. Retrieve relevant examples (embedded)
    2. Build prompt (component composition)
    3. Generate with LLM
    4. Validate with Pydantic
    5. Return type-safe result
    """
    
    # 1. Retrieve similar examples
    relevant_examples = await example_retriever.get_similar(
        query=f"Classify {league_data['sport']} league",
        namespace="tier-classification",
        k=5
    )
    
    # 2. Build prompt with components + config + examples
    prompt = classification_builder.build(
        system_instructions="tier_classifier.md",
        few_shot_examples=relevant_examples,
        config_weights=load_config("scoring_model.v1.json"),
        input_data=league_data
    )
    
    # 3. Generate with structured output
    llm = ChatOpenAI(model="gpt-4")
    structured_llm = llm.with_structured_output(TierClassification)
    
    result = structured_llm.invoke(prompt)
    # Returns validated Pydantic model!
    
    return result  # Type-safe TierClassification object

Example 2: Frontend Receives Validated Data

// Frontend receives API response
import { contractTermsSchema } from '@/data_layer/definitions/schemas/generated/zod';
import type { ContractTerms } from '@/data_layer/definitions/schemas/generated/typescript';
 
async function fetchContract(leagueId: string): Promise<ContractTerms> {
  const response = await fetch(`/api/contracts/${leagueId}`);
  const data = await response.json();
  
  // Zod validates at runtime
  const validated = contractTermsSchema.parse(data);
  
  // TypeScript types ensure compile-time safety
  return validated;  // Type: ContractTerms
}

Example 3: Retrieve Prompts from Embedded Space

from weave.retrievers import prompt_retriever
 
# Find similar prompts for new task
similar_prompts = await prompt_retriever.get_similar(
    query="Need to extract racing event data from PDF",
    namespace="prompts",
    filters={"category": "extraction"},
    k=3
)
 
# Use as reference or starting point
for prompt in similar_prompts:
    print(f"Similar prompt: {prompt.metadata['title']}")
    print(f"Similarity: {prompt.score}")
    print(f"Content preview: {prompt.content[:200]}...")

πŸ“Š Storage Strategy Matrix

Data TypeSource LocationGenerated ToQueryable ViaUse Case
Config Filesdefinitions/config/PostgreSQL (JSONB)SQL queriesBusiness rules lookup
Config Filesdefinitions/config/LangMem (vectors)Semantic searchRAG context
Config Filesdefinitions/config/Redis (JSON)Key-valueHot data cache
Examplesdefinitions/examples/seeds/LangMem (vectors)Semantic searchFew-shot learning
Examplesdefinitions/examples/generated/LangMem (vectors)Semantic searchTraining data
Promptsviews/prompts/LangMem (vectors)Semantic searchPrompt retrieval
Schemasdefinitions/schemas/canonical/GitFile systemSingle source
Pydanticdefinitions/schemas/generated/pydantic/GitPython importBackend validation
Zoddefinitions/schemas/generated/zod/GitTypeScript importFrontend validation
Drizzledefinitions/schemas/generated/drizzle/GitTypeScript importORM operations

πŸ” Governance & Best Practices

Versioning Strategy

All source files use semantic versioning:
- tier_presets.v1.json β†’ v2.json (breaking changes)
- scoring_model.v1.1.json (minor improvements)
- archetypes.v1.0.1.json (patches)

Version in filename AND inside JSON:
{
  "version": "1.2.0",
  "schemaVersion": "draft-2020-12",
  "lastUpdated": "2025-10-16"
}

Change Management

# When you update a config file:
 
# 1. Edit SOURCE_OF_TRUTH
vim data_layer/definitions/config/business/pricing/tier_presets.v1.json
 
# 2. Increment version
# "version": 5 β†’ "version": 6
 
# 3. Run generators
python data_layer/scripts/generate/generate_all.py
 
# 4. Run sync
python data_layer/scripts/sync/sync_all.py
 
# 5. Verify in each system
psql -c "SELECT version FROM business_config WHERE config_type='tier_presets'"
redis-cli GET config:tier_presets:version
# Check LangMem dashboard for new embeddings
 
# 6. Git commit
git add data_layer/definitions/config/business/pricing/tier_presets.v1.json
git commit -m "feat(pricing): Update tier 1 pricing to \$150k (v6)"

πŸ§ͺ Testing Strategy

Unit Tests: Test Each Layer

# tests/test_builders.py
def test_prompt_builder_uses_live_config():
    """Ensure prompts load actual config values"""
    from weave.builders.prompts import classification_builder
    
    prompt = classification_builder.build("tier_classifier")
    
    # Should contain actual weights from config
    assert "0.25" in prompt  # market_potential weight
    assert "0.20" in prompt  # data_quality weight
 
def test_example_generation_from_config():
    """Ensure examples generated match config"""
    from weave.builders.examples import config_to_examples
    
    examples = config_to_examples("tier_presets.v1.json")
    
    # Should have example for each tier
    assert len(examples) >= 4  # tier_1 through tier_4
    
    # Should contain actual pricing
    tier_1_example = [e for e in examples if 'tier_1' in str(e)][0]
    assert "$25000" in tier_1_example['output'] or 25000 in tier_1_example['output']

Integration Tests: Test Data Flow

# tests/test_retrieval.py
async def test_end_to_end_retrieval():
    """Test complete retrieval flow"""
    
    # 1. Sync data
    from data_layer.scripts.sync import sync_all
    await sync_all.sync_all()
    
    # 2. Retrieve from LangMem
    from weave.retrievers import example_retriever
    examples = await example_retriever.get_similar(
        query="Tier 1 combat league",
        k=3
    )
    
    assert len(examples) == 3
    assert all('tier_1' in str(e) or 'combat' in str(e) for e in examples)
    
    # 3. Use in prompt
    from weave.builders.prompts import classification_builder
    prompt = classification_builder.build(
        examples=examples
    )
    
    assert len(prompt) > 1000  # Substantial prompt

πŸ“š Developer Workflows

Workflow 1: Add New Business Rule

# 1. Create config file
cat > data_layer/definitions/config/business/new_rule.v1.json << 'EOF'
{
  "version": "1.0.0",
  "rule_type": "validation",
  "rules": {
    "minimum_revenue": 100000
  }
}
EOF
 
# 2. Generate examples
python data_layer/scripts/generate/generate_examples.py --config=new_rule.v1.json
 
# 3. Sync to runtime
python data_layer/scripts/sync/sync_all.py
 
# 4. Verify
psql -c "SELECT * FROM business_config WHERE config_type='new_rule'"

Workflow 2: Update Prompt Component

# 1. Edit component
vim data_layer/definitions/prompts/components/system_instructions/my_agent.md
 
# 2. Rebuild prompts that use it
python data_layer/scripts/generate/generate_prompts.py --component=my_agent
 
# 3. Re-embed for retrieval
python data_layer/scripts/embed/embed_prompts.py
 
# 4. Test retrieval
python -c "
from weave.retrievers import prompt_retriever
prompts = prompt_retriever.get_similar('task for my_agent', k=1)
print(prompts[0].content[:200])
"

Workflow 3: Add Training Example

# 1. Add to seeds (manual)
cat >> data_layer/definitions/examples/seeds/onboarding/tier-classification.jsonl << 'EOF'
{"input": "What tier for Premier Lacrosse League?", "output": "Tier 1 - High revenue, established brand", "metadata": {"tier": "tier_1", "sport": "lacrosse"}}
EOF
 
# 2. Embed into LangMem
python data_layer/scripts/embed/embed_examples.py --file=tier-classification.jsonl
 
# 3. Test retrieval
python -c "
from weave.retrievers import example_retriever
examples = example_retriever.get_similar('tier for lacrosse league', k=1)
print(examples[0].content)
"

πŸŽ“ Architecture Principles

1. Single Source of Truth

  • All canonical data in definitions/
  • Never edit views/ or runtime systems directly
  • Always regenerate from source

2. Everything is Retrievable

  • Configs β†’ embedded for semantic search
  • Examples β†’ embedded for RAG
  • Prompts β†’ embedded for reuse
  • All have metadata for filtering

3. Type Safety Everywhere

  • JSON Schema β†’ Pydantic (backend)
  • JSON Schema β†’ Zod (frontend)
  • JSON Schema β†’ TypeScript (types)
  • JSON Schema β†’ Drizzle (ORM)

4. Generation Over Duplication

  • Don't copy, generate
  • Don't hardcode, compose
  • Don't scatter, centralize then distribute

5. Multi-Storage Optimization

  • PostgreSQL for structured queries
  • LangMem for semantic search
  • Redis for speed
  • Supabase for auth/realtime

πŸ“– Quick Reference Card

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    I NEED TO...                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                              β”‚
β”‚  Add pricing rule β†’ definitions/config/business/pricing/    β”‚
β”‚  Add scoring weight β†’ definitions/config/business/scoring/  β”‚
β”‚  Add JSON Schema β†’ definitions/schemas/canonical/           β”‚
β”‚  Add training example β†’ definitions/examples/seeds/         β”‚
β”‚  Add prompt component β†’ definitions/prompts/components/     β”‚
β”‚                                                              β”‚
β”‚  Build prompt β†’ weave/builders/prompts/                     β”‚
β”‚  Generate examples β†’ weave/builders/examples/               β”‚
β”‚  Generate Pydantic β†’ weave/builders/schemas/                β”‚
β”‚  Embed for RAG β†’ weave/embedders/                           β”‚
β”‚  Retrieve examples β†’ weave/retrievers/                      β”‚
β”‚                                                              β”‚
β”‚  Query business rules β†’ views/ β†’ PostgreSQL                 β”‚
β”‚  Semantic search β†’ views/ β†’ LangMem                         β”‚
β”‚  Fast access β†’ views/ β†’ Redis                               β”‚
β”‚                                                              β”‚
β”‚  Sync everything β†’ scripts/sync/sync_all.py                 β”‚
β”‚  Generate everything β†’ scripts/generate/generate_all.py     β”‚
β”‚  Embed everything β†’ scripts/embed/embed_all.py              β”‚
β”‚                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🎯 Success Criteria

After full implementation:

βœ… Discoverability: Any developer finds source data in < 30 seconds
βœ… Consistency: Zero manual edits to runtime systems
βœ… Type Safety: 100% schema coverage (Pydantic + Zod)
βœ… Retrieval: < 100ms semantic search across all data
βœ… Validation: Backend (Pydantic) + Frontend (Zod) from same source
βœ… Prompts: Dynamic composition with live config injection
βœ… Examples: Embedded for intelligent few-shot selection
βœ… Caching: Hot paths < 10ms via Redis


Next Steps: See MIGRATION_GUIDE_PRACTICAL.md for step-by-step implementation

Platform

Documentation

Community

Support

partnership@altsportsdata.comdev@altsportsleagues.ai

2025 Β© AltSportsLeagues.ai. Powered by AI-driven sports business intelligence.

πŸ€– AI-Enhancedβ€’πŸ“Š Data-Drivenβ€’βš‘ Real-Time