Source: data_layer/docs/COMPREHENSIVE_ORGANIZATION_PLAN.md

🎯 Comprehensive Knowledge Organization Plan

Executive Summary

This plan unifies FOUR interconnected knowledge systems into a coherent, discoverable architecture:

database/prompts/ - Prompt components, builders, and generation
database/storage/ - Runtime operational data (code-based examples module)
database/knowledge/ - Retrieval, embeddings, intent (code modules)
database/kb_catalog/ - Business rules, registries, manifests

Core Philosophy: Organize by LIFECYCLE STAGE (source → runtime → queryable), not by data type.

🏗️ Proposed Architecture

Level 1: Source of Truth (Version-Controlled Definitions)

database/
├── SOURCE_OF_TRUTH/                    # NEW: Git-tracked, canonical data
│   ├── schemas/                        # ✅ Already exists (keep as-is)
│   │   ├── canonical/                  # JSON Schema (Draft 2020-12)
│   │   ├── generated/                  # Auto-generated adapters
│   │   └── domain/v1/drizzle/          # ✅ Drizzle schemas
│   │
│   ├── config/                         # NEW: Business configuration files
│   │   ├── business/
│   │   │   ├── pricing/
│   │   │   │   ├── tier_presets.v1.json          # ← MOVE FROM output-styles/config
│   │   │   │   ├── combat.pricing.v1.json        # ← MOVE FROM output-styles/config
│   │   │   │   └── README.md
│   │   │   ├── scoring/
│   │   │   │   ├── scoring_model.v1.json         # ← MOVE FROM output-styles/config
│   │   │   │   └── README.md
│   │   │   └── README.md
│   │   │
│   │   ├── sports/                     # NEW: Sport-specific configs
│   │   │   ├── archetypes.json
│   │   │   ├── betting_markets.json
│   │   │   └── README.md
│   │   │
│   │   └── README.md                   # Config governance doc
│   │
│   ├── prompts/                        # NEW: Static prompt templates
│   │   ├── templates/                  # Jinja2/Mustache templates
│   │   │   ├── onboarding/
│   │   │   ├── classification/
│   │   │   └── contract_generation/
│   │   │
│   │   ├── components/                 # Reusable prompt fragments
│   │   │   ├── system_instructions/
│   │   │   ├── few_shot_examples/
│   │   │   └── output_formats/
│   │   │
│   │   └── README.md                   # Template usage guide
│   │
│   └── examples/                       # NEW: Training/reference examples
│       ├── onboarding/
│       │   ├── questionnaire_extraction/
│       │   │   ├── examples.jsonl      # LangMem-ready format
│       │   │   ├── metadata.json
│       │   │   └── README.md
│       │   │
│       │   ├── tier_classification/
│       │   │   ├── examples.jsonl
│       │   │   └── generated_from_config.jsonl  # AUTO-GENERATED
│       │   │
│       │   └── contract_assembly/
│       │       ├── examples.jsonl
│       │       └── README.md
│       │
│       ├── sports_classification/
│       │   ├── by_archetype/
│       │   └── by_market_readiness/
│       │
│       └── README.md                   # Example governance

Level 2: Runtime Services (Operational Layer)

database/
├── knowledge/                          # ✅ Keep as-is (Python modules)
│   ├── embeddings/                     # Vector generation
│   ├── intent/                         # Query classification
│   ├── retrieval/                      # RAG operations
│   ├── storage/                        # Vector DB interface
│   └── templates/                      # Dynamic prompt assembly
│
├── storage/                            # ✅ Keep as-is (Python modules)
│   ├── examples/                       # Code module for example access
│   ├── postgres/                       # PostgreSQL operations
│   ├── redis/                          # Cache layer
│   └── supabase/                       # Supabase operations
│
└── prompts/                            # ✅ Enhance (add builders/)
    ├── builders/                       # NEW: Prompt construction code
    │   ├── onboarding_prompts.py
    │   ├── classification_prompts.py
    │   └── contract_prompts.py
    │
    ├── registry/                       # Prompt metadata
    └── README.md

Level 3: Business Intelligence (Queryable Layer)

database/
├── kb_catalog/                         # ✅ Keep as-is (enhanced)
│   ├── manifests/                      # System inventories
│   ├── registry/                       # Component registries
│   ├── constants/                      # Enum-like data
│   └── config/                         # Catalog configuration
│
└── output-styles/                      # ✅ Restructure (remove config/)
    ├── onboarding/                     # Pipeline stages
    │   ├── 02-ingest-validate-questionnaire/
    │   ├── 03-enhance-documents/
    │   ├── 04-classify/
    │   ├── 05-upsert-and-crossref/
    │   ├── 06-suggest-tiers-and-terms/
    │   │   ├── example_seeds/          # ← Keep (synthetic seeds)
    │   │   ├── examples/               # ← Keep (generated outputs)
    │   │   ├── generate/               # ← Keep (generation code)
    │   │   ├── models/
    │   │   ├── schema/
    │   │   └── README.md
    │   ├── 07-assemble-contract/
    │   ├── 07a-output-contract-export/
    │   ├── 07b-output-gamekeeper-scorekeeper-ui/
    │   └── 07c-output-marketing-nxt-onboarding-materials/
    │
    └── README-ORGANIZATION.md

📦 Migration Plan

Phase 1: Create New Structure (No Deletions)

# 1. Create SOURCE_OF_TRUTH hierarchy
mkdir -p database/SOURCE_OF_TRUTH/{config,prompts,examples}
mkdir -p database/SOURCE_OF_TRUTH/config/business/{pricing,scoring}
mkdir -p database/SOURCE_OF_TRUTH/config/sports
mkdir -p database/SOURCE_OF_TRUTH/prompts/{templates,components}
mkdir -p database/SOURCE_OF_TRUTH/examples/onboarding/{questionnaire_extraction,tier_classification,contract_assembly}
 
# 2. Copy (don't move yet) config files
cp database/output-styles/config/business/pricing/tier_presets.v1.json \
   database/SOURCE_OF_TRUTH/config/business/pricing/
 
cp database/output-styles/config/business/pricing/combat.pricing.v1.json \
   database/SOURCE_OF_TRUTH/config/business/pricing/
 
cp database/output-styles/config/business/scoring/scoring_model.v1.json \
   database/SOURCE_OF_TRUTH/config/business/scoring/
 
# 3. Create README files
touch database/SOURCE_OF_TRUTH/README.md
touch database/SOURCE_OF_TRUTH/config/README.md
touch database/SOURCE_OF_TRUTH/config/business/README.md
touch database/SOURCE_OF_TRUTH/prompts/README.md
touch database/SOURCE_OF_TRUTH/examples/README.md

Phase 2: Generate Derived Examples

# database/scripts/generate_examples_from_configs.py
"""
Generate training examples from SOURCE_OF_TRUTH configs
Output to SOURCE_OF_TRUTH/examples/ in JSONL format
"""
import json
from pathlib import Path
 
def generate_tier_examples():
    """Convert tier_presets.v1.json into LangMem-ready examples"""
    
    config_path = Path("database/SOURCE_OF_TRUTH/config/business/pricing/tier_presets.v1.json")
    output_path = Path("database/SOURCE_OF_TRUTH/examples/onboarding/tier_classification/generated_from_config.jsonl")
    
    with open(config_path) as f:
        config = json.load(f)
    
    examples = []
    
    for tier_name, tier_data in config['tiers'].items():
        # Example 1: Pricing lookup
        examples.append({
            "input": f"What are the pricing terms for {tier_name}?",
            "output": format_pricing_response(tier_data),
            "metadata": {
                "tier": tier_name,
                "type": "pricing_lookup",
                "source": "tier_presets.v1.json",
                "version": config['version']
            }
        })
        
        # Example 2: Tier recommendation
        if "example_category" in tier_data:
            examples.append({
                "input": f"What tier should I recommend for a {tier_data['example_category']} league?",
                "output": f"Recommend {tier_name} because: {format_justification(tier_data)}",
                "metadata": {
                    "tier": tier_name,
                    "type": "tier_recommendation",
                    "category": tier_data['example_category']
                }
            })
    
    # Write as JSONL
    output_path.parent.mkdir(parents=True, exist_ok=True)
    with open(output_path, 'w') as f:
        for ex in examples:
            f.write(json.dumps(ex) + '\n')
    
    print(f"✅ Generated {len(examples)} examples to {output_path}")
 
if __name__ == "__main__":
    generate_tier_examples()

Phase 3: Sync to Runtime Systems

# database/scripts/sync_to_runtime_systems.py
"""
Sync SOURCE_OF_TRUTH data to:
- PostgreSQL (JSONB for querying)
- LangMem (vector embeddings for RAG)
- Redis (cache for hot data)
"""
import json
from pathlib import Path
import psycopg2
from langmem import LangMemClient
 
def sync_configs_to_databases():
    """Multi-storage sync strategy"""
    
    # 1. PostgreSQL: Queryable business rules
    pg = psycopg2.connect(DATABASE_URL)
    
    config_files = Path("database/SOURCE_OF_TRUTH/config").rglob("*.json")
    
    for config_file in config_files:
        with open(config_file) as f:
            data = json.load(f)
        
        # Insert as versioned JSONB
        pg.execute("""
            INSERT INTO business_config 
            (config_type, version, file_path, config_data, updated_at)
            VALUES (%s, %s, %s, %s, NOW())
            ON CONFLICT (config_type, version) 
            DO UPDATE SET 
                config_data = EXCLUDED.config_data,
                updated_at = NOW()
        """, (
            config_file.stem,
            data.get('version', 1),
            str(config_file),
            json.dumps(data)
        ))
    
    # 2. LangMem: Semantic search
    langmem = LangMemClient(namespace="business-rules")
    
    example_files = Path("database/SOURCE_OF_TRUTH/examples").rglob("*.jsonl")
    
    for example_file in example_files:
        with open(example_file) as f:
            for line in f:
                example = json.loads(line)
                
                langmem.store(
                    content=f"{example['input']}\n\n{example['output']}",
                    metadata={
                        **example.get('metadata', {}),
                        "source_file": str(example_file)
                    }
                )
    
    print("✅ Sync complete: PostgreSQL + LangMem")

🔄 Data Flow Diagram

┌─────────────────────────────────────────────────────────────┐
│                     SOURCE_OF_TRUTH                          │
│  (Git-tracked, version-controlled, single source)           │
│                                                              │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐       │
│  │ Schemas │  │ Configs │  │ Prompts │  │Examples │       │
│  └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘       │
└───────┼───────────┼────────────┼────────────┼──────────────┘
        │           │            │            │
        │           └────────┐   │   ┌────────┘
        │                    │   │   │
        ▼                    ▼   ▼   ▼
┌─────────────────────────────────────────────────────────────┐
│                  GENERATION LAYER                            │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  scripts/generate_examples_from_configs.py           │  │
│  │  scripts/sync_to_runtime_systems.py                  │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                              │
│  AUTO-GENERATES:                                            │
│  • Training examples (JSONL)                                │
│  • Database inserts (SQL)                                   │
│  • Vector embeddings (LangMem)                              │
└─────────────────────────────────────────────────────────────┘
        │           │            │            │
        │           │            │            │
        ▼           ▼            ▼            ▼
┌─────────────────────────────────────────────────────────────┐
│                     RUNTIME LAYER                            │
│  (Queryable, cached, optimized for retrieval)              │
│                                                              │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐       │
│  │PostgreSQL│  │ LangMem │  │  Redis  │  │ Supabase│       │
│  │  (JSONB) │  │ (Vector)│  │ (Cache) │  │ (Sync)  │       │
│  └─────────┘  └─────────┘  └─────────┘  └─────────┘       │
└─────────────────────────────────────────────────────────────┘
        │           │            │            │
        │           │            │            │
        ▼           ▼            ▼            ▼
┌─────────────────────────────────────────────────────────────┐
│                  APPLICATION LAYER                           │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Prompts    │  │   Knowledge  │  │   Storage    │     │
│  │   Builders   │  │   Retrieval  │  │   Access     │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
│                                                              │
│  USED BY:                                                    │
│  • FastAPI endpoints                                         │
│  • LangGraph workflows                                       │
│  • MCP servers                                               │
└─────────────────────────────────────────────────────────────┘

🎯 Key Decisions & Rationale

Decision 1: Why Move Configs to SOURCE_OF_TRUTH?

Problem: tier_presets.v1.json and scoring_model.v1.json were buried in output-styles/config/business/, making them hard to discover.

Solution:

Move to top-level SOURCE_OF_TRUTH/config/
Make it clear these are canonical business rules, not pipeline artifacts
Enable versioning and change tracking via Git

Decision 2: Why Separate Examples from Config?

Problem: Examples were scattered across pipeline stages, making them difficult to use for RAG/training.

Solution:

Store source examples in SOURCE_OF_TRUTH/examples/
Store generated outputs in output-styles/onboarding/{stage}/examples/
Use generate/ scripts to create training examples from configs

Decision 3: Why Keep Prompts Separate?

Problem: Prompt components, templates, and builders were in different places (database/prompts/, code modules).

Solution:

Static templates → SOURCE_OF_TRUTH/prompts/templates/
Prompt builders (code) → database/prompts/builders/
Dynamic assembly → database/knowledge/templates/

Decision 4: How Do We Avoid Duplication?

Strategy: Single Source of Truth + Generation Scripts

# One config file generates many artifacts:
 
tier_presets.v1.json (SOURCE_OF_TRUTH)
    ↓
    ├─→ Training examples (JSONL)
    ├─→ PostgreSQL rows (JSONB)
    ├─→ LangMem embeddings
    ├─→ Redis cache entries
    └─→ API response templates

📝 README Templates

SOURCE_OF_TRUTH/README.md

# Source of Truth
 
This directory contains all **canonical, version-controlled data** for the AltSports system.
 
## Philosophy
 
**Single Source of Truth**: All derived data (database records, embeddings, cache entries) is GENERATED from files here.
 
## Structure
 
- **`schemas/`**: JSON Schema definitions (already exists)
- **`config/`**: Business configuration files (pricing, scoring, sports)
- **`prompts/`**: Static prompt templates (Jinja2/Mustache)
- **`examples/`**: Training and reference examples (JSONL format)
 
## Usage
 
1. **Edit files here** (version-controlled)
2. **Run generation scripts** to sync to runtime systems:
   ```bash
   python database/scripts/generate_examples_from_configs.py
   python database/scripts/sync_to_runtime_systems.py

Verify sync in PostgreSQL, LangMem, Redis

⚠️ Important

Never edit data in runtime systems directly
Always update SOURCE_OF_TRUTH first
Always run sync scripts after changes


### SOURCE_OF_TRUTH/config/README.md

```markdown
# Business Configuration Files

Canonical configuration for pricing, scoring, and sports logic.

## Files

### Pricing
- **`tier_presets.v1.json`**: Pricing tiers, SLAs, contract templates
- **`combat.pricing.v1.json`**: Combat sports vertical pricing

### Scoring
- **`scoring_model.v1.json`**: Scoring weights, modifiers, thresholds

### Sports
- **`archetypes.json`**: Sport classification rules
- **`betting_markets.json`**: Market definitions by sport

## Generation

These configs auto-generate:
- Training examples → `SOURCE_OF_TRUTH/examples/`
- Database records → PostgreSQL `business_config` table
- Vector embeddings → LangMem `business-rules` namespace

Run:
```bash
python database/scripts/sync_to_runtime_systems.py


### SOURCE_OF_TRUTH/examples/README.md

```markdown
# Training & Reference Examples

JSONL-formatted examples for LLM training, RAG, and testing.

## Format

All examples follow this structure:
```json
{
  "input": "User query or task description",
  "output": "Expected response or result",
  "metadata": {
    "type": "example_type",
    "source": "originating_config_file",
    "version": 1
  }
}

Auto-Generated Examples

Files ending in generated_from_config.jsonl are AUTO-GENERATED:

python database/scripts/generate_examples_from_configs.py

Do not edit these manually. Edit the source config instead.


---

## 🚀 Implementation Checklist

### Week 1: Foundation
- [ ] Create `SOURCE_OF_TRUTH/` directory structure
- [ ] Write `SOURCE_OF_TRUTH/README.md`
- [ ] Copy (don't move) config files from `output-styles/config/`
- [ ] Create `database/scripts/generate_examples_from_configs.py`
- [ ] Test example generation locally

### Week 2: Sync Infrastructure
- [ ] Create PostgreSQL `business_config` table
- [ ] Write `database/scripts/sync_to_runtime_systems.py`
- [ ] Test PostgreSQL sync
- [ ] Test LangMem sync
- [ ] Add Redis caching layer

### Week 3: Prompt Organization
- [ ] Move static templates to `SOURCE_OF_TRUTH/prompts/templates/`
- [ ] Create `database/prompts/builders/` for code modules
- [ ] Update imports in existing code
- [ ] Test prompt building

### Week 4: Integration & Testing
- [ ] Update FastAPI endpoints to use new paths
- [ ] Update LangGraph workflows
- [ ] Update MCP servers
- [ ] Run full system test
- [ ] Document migration in CHANGELOG

### Week 5: Cleanup
- [ ] Remove old `output-styles/config/` (after verification)
- [ ] Update all README files
- [ ] Create developer documentation
- [ ] Train team on new structure

---

## 🎓 Developer Guide

### How to Add a New Business Config

1. **Create config in SOURCE_OF_TRUTH**:
   ```bash
   vim database/SOURCE_OF_TRUTH/config/business/my_new_config.v1.json

Update generation script:

# database/scripts/generate_examples_from_configs.py
def generate_my_new_examples():
    # Add logic here
    pass

Run sync:

python database/scripts/sync_to_runtime_systems.py

Verify:

SELECT * FROM business_config WHERE config_type = 'my_new_config';

How to Use Examples in RAG

from langmem import LangMemClient
 
# Query examples
client = LangMemClient(namespace="business-rules")
results = client.query(
    query="What tier for a combat league?",
    filters={"type": "tier_recommendation"}
)
 
for result in results:
    print(f"Example: {result.content}")
    print(f"Metadata: {result.metadata}")

How to Build Prompts

from database.prompts.builders import onboarding_prompts
 
# Use prompt builder
prompt = onboarding_prompts.build_tier_classification_prompt(
    league_data=league_json,
    config=tier_presets,
    examples=langmem_examples
)

📊 Success Metrics

After implementation, we should achieve:

Discoverability: Developers find config files in < 30 seconds
Consistency: Zero manual config updates to runtime systems
Versioning: 100% of config changes tracked in Git
RAG Quality: 30% improvement in tier recommendation accuracy
Maintainability: 50% reduction in "where is X?" questions

🔗 Related Documentation

Last Updated: 2025-01-16 Next Review: After Phase 2 completion

🎯 Complete Organization Strategy Analysis 🎯 Final Naming Decision: Deep Comparison

🎯 Comprehensive Knowledge Organization Plan

Executive Summary

🏗️ Proposed Architecture

Level 1: Source of Truth (Version-Controlled Definitions)

Level 2: Runtime Services (Operational Layer)

Level 3: Business Intelligence (Queryable Layer)

📦 Migration Plan

Phase 1: Create New Structure (No Deletions)

Phase 2: Generate Derived Examples

Phase 3: Sync to Runtime Systems

🔄 Data Flow Diagram

🎯 Key Decisions & Rationale

Decision 1: Why Move Configs to SOURCE_OF_TRUTH?

Decision 2: Why Separate Examples from Config?

Decision 3: Why Keep Prompts Separate?

Decision 4: How Do We Avoid Duplication?

📝 README Templates

SOURCE_OF_TRUTH/README.md

⚠️ Important

Categories

Auto-Generated Examples

How to Use Examples in RAG

How to Build Prompts

📊 Success Metrics

🔗 Related Documentation

Platform

Documentation

Community

Support