Source: data_layer/docs/COMPREHENSIVE_ORGANIZATION_PLAN.md
π― Comprehensive Knowledge Organization Plan
Executive Summary
This plan unifies FOUR interconnected knowledge systems into a coherent, discoverable architecture:
database/prompts/- Prompt components, builders, and generationdatabase/storage/- Runtime operational data (code-based examples module)database/knowledge/- Retrieval, embeddings, intent (code modules)database/kb_catalog/- Business rules, registries, manifests
Core Philosophy: Organize by LIFECYCLE STAGE (source β runtime β queryable), not by data type.
ποΈ Proposed Architecture
Level 1: Source of Truth (Version-Controlled Definitions)
database/
βββ SOURCE_OF_TRUTH/ # NEW: Git-tracked, canonical data
β βββ schemas/ # β
Already exists (keep as-is)
β β βββ canonical/ # JSON Schema (Draft 2020-12)
β β βββ generated/ # Auto-generated adapters
β β βββ domain/v1/drizzle/ # β
Drizzle schemas
β β
β βββ config/ # NEW: Business configuration files
β β βββ business/
β β β βββ pricing/
β β β β βββ tier_presets.v1.json # β MOVE FROM output-styles/config
β β β β βββ combat.pricing.v1.json # β MOVE FROM output-styles/config
β β β β βββ README.md
β β β βββ scoring/
β β β β βββ scoring_model.v1.json # β MOVE FROM output-styles/config
β β β β βββ README.md
β β β βββ README.md
β β β
β β βββ sports/ # NEW: Sport-specific configs
β β β βββ archetypes.json
β β β βββ betting_markets.json
β β β βββ README.md
β β β
β β βββ README.md # Config governance doc
β β
β βββ prompts/ # NEW: Static prompt templates
β β βββ templates/ # Jinja2/Mustache templates
β β β βββ onboarding/
β β β βββ classification/
β β β βββ contract_generation/
β β β
β β βββ components/ # Reusable prompt fragments
β β β βββ system_instructions/
β β β βββ few_shot_examples/
β β β βββ output_formats/
β β β
β β βββ README.md # Template usage guide
β β
β βββ examples/ # NEW: Training/reference examples
β βββ onboarding/
β β βββ questionnaire_extraction/
β β β βββ examples.jsonl # LangMem-ready format
β β β βββ metadata.json
β β β βββ README.md
β β β
β β βββ tier_classification/
β β β βββ examples.jsonl
β β β βββ generated_from_config.jsonl # AUTO-GENERATED
β β β
β β βββ contract_assembly/
β β βββ examples.jsonl
β β βββ README.md
β β
β βββ sports_classification/
β β βββ by_archetype/
β β βββ by_market_readiness/
β β
β βββ README.md # Example governanceLevel 2: Runtime Services (Operational Layer)
database/
βββ knowledge/ # β
Keep as-is (Python modules)
β βββ embeddings/ # Vector generation
β βββ intent/ # Query classification
β βββ retrieval/ # RAG operations
β βββ storage/ # Vector DB interface
β βββ templates/ # Dynamic prompt assembly
β
βββ storage/ # β
Keep as-is (Python modules)
β βββ examples/ # Code module for example access
β βββ postgres/ # PostgreSQL operations
β βββ redis/ # Cache layer
β βββ supabase/ # Supabase operations
β
βββ prompts/ # β
Enhance (add builders/)
βββ builders/ # NEW: Prompt construction code
β βββ onboarding_prompts.py
β βββ classification_prompts.py
β βββ contract_prompts.py
β
βββ registry/ # Prompt metadata
βββ README.mdLevel 3: Business Intelligence (Queryable Layer)
database/
βββ kb_catalog/ # β
Keep as-is (enhanced)
β βββ manifests/ # System inventories
β βββ registry/ # Component registries
β βββ constants/ # Enum-like data
β βββ config/ # Catalog configuration
β
βββ output-styles/ # β
Restructure (remove config/)
βββ onboarding/ # Pipeline stages
β βββ 02-ingest-validate-questionnaire/
β βββ 03-enhance-documents/
β βββ 04-classify/
β βββ 05-upsert-and-crossref/
β βββ 06-suggest-tiers-and-terms/
β β βββ example_seeds/ # β Keep (synthetic seeds)
β β βββ examples/ # β Keep (generated outputs)
β β βββ generate/ # β Keep (generation code)
β β βββ models/
β β βββ schema/
β β βββ README.md
β βββ 07-assemble-contract/
β βββ 07a-output-contract-export/
β βββ 07b-output-gamekeeper-scorekeeper-ui/
β βββ 07c-output-marketing-nxt-onboarding-materials/
β
βββ README-ORGANIZATION.mdπ¦ Migration Plan
Phase 1: Create New Structure (No Deletions)
# 1. Create SOURCE_OF_TRUTH hierarchy
mkdir -p database/SOURCE_OF_TRUTH/{config,prompts,examples}
mkdir -p database/SOURCE_OF_TRUTH/config/business/{pricing,scoring}
mkdir -p database/SOURCE_OF_TRUTH/config/sports
mkdir -p database/SOURCE_OF_TRUTH/prompts/{templates,components}
mkdir -p database/SOURCE_OF_TRUTH/examples/onboarding/{questionnaire_extraction,tier_classification,contract_assembly}
# 2. Copy (don't move yet) config files
cp database/output-styles/config/business/pricing/tier_presets.v1.json \
database/SOURCE_OF_TRUTH/config/business/pricing/
cp database/output-styles/config/business/pricing/combat.pricing.v1.json \
database/SOURCE_OF_TRUTH/config/business/pricing/
cp database/output-styles/config/business/scoring/scoring_model.v1.json \
database/SOURCE_OF_TRUTH/config/business/scoring/
# 3. Create README files
touch database/SOURCE_OF_TRUTH/README.md
touch database/SOURCE_OF_TRUTH/config/README.md
touch database/SOURCE_OF_TRUTH/config/business/README.md
touch database/SOURCE_OF_TRUTH/prompts/README.md
touch database/SOURCE_OF_TRUTH/examples/README.mdPhase 2: Generate Derived Examples
# database/scripts/generate_examples_from_configs.py
"""
Generate training examples from SOURCE_OF_TRUTH configs
Output to SOURCE_OF_TRUTH/examples/ in JSONL format
"""
import json
from pathlib import Path
def generate_tier_examples():
"""Convert tier_presets.v1.json into LangMem-ready examples"""
config_path = Path("database/SOURCE_OF_TRUTH/config/business/pricing/tier_presets.v1.json")
output_path = Path("database/SOURCE_OF_TRUTH/examples/onboarding/tier_classification/generated_from_config.jsonl")
with open(config_path) as f:
config = json.load(f)
examples = []
for tier_name, tier_data in config['tiers'].items():
# Example 1: Pricing lookup
examples.append({
"input": f"What are the pricing terms for {tier_name}?",
"output": format_pricing_response(tier_data),
"metadata": {
"tier": tier_name,
"type": "pricing_lookup",
"source": "tier_presets.v1.json",
"version": config['version']
}
})
# Example 2: Tier recommendation
if "example_category" in tier_data:
examples.append({
"input": f"What tier should I recommend for a {tier_data['example_category']} league?",
"output": f"Recommend {tier_name} because: {format_justification(tier_data)}",
"metadata": {
"tier": tier_name,
"type": "tier_recommendation",
"category": tier_data['example_category']
}
})
# Write as JSONL
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, 'w') as f:
for ex in examples:
f.write(json.dumps(ex) + '\n')
print(f"β
Generated {len(examples)} examples to {output_path}")
if __name__ == "__main__":
generate_tier_examples()Phase 3: Sync to Runtime Systems
# database/scripts/sync_to_runtime_systems.py
"""
Sync SOURCE_OF_TRUTH data to:
- PostgreSQL (JSONB for querying)
- LangMem (vector embeddings for RAG)
- Redis (cache for hot data)
"""
import json
from pathlib import Path
import psycopg2
from langmem import LangMemClient
def sync_configs_to_databases():
"""Multi-storage sync strategy"""
# 1. PostgreSQL: Queryable business rules
pg = psycopg2.connect(DATABASE_URL)
config_files = Path("database/SOURCE_OF_TRUTH/config").rglob("*.json")
for config_file in config_files:
with open(config_file) as f:
data = json.load(f)
# Insert as versioned JSONB
pg.execute("""
INSERT INTO business_config
(config_type, version, file_path, config_data, updated_at)
VALUES (%s, %s, %s, %s, NOW())
ON CONFLICT (config_type, version)
DO UPDATE SET
config_data = EXCLUDED.config_data,
updated_at = NOW()
""", (
config_file.stem,
data.get('version', 1),
str(config_file),
json.dumps(data)
))
# 2. LangMem: Semantic search
langmem = LangMemClient(namespace="business-rules")
example_files = Path("database/SOURCE_OF_TRUTH/examples").rglob("*.jsonl")
for example_file in example_files:
with open(example_file) as f:
for line in f:
example = json.loads(line)
langmem.store(
content=f"{example['input']}\n\n{example['output']}",
metadata={
**example.get('metadata', {}),
"source_file": str(example_file)
}
)
print("β
Sync complete: PostgreSQL + LangMem")π Data Flow Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SOURCE_OF_TRUTH β
β (Git-tracked, version-controlled, single source) β
β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β β Schemas β β Configs β β Prompts β βExamples β β
β ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ β
βββββββββΌββββββββββββΌβββββββββββββΌβββββββββββββΌβββββββββββββββ
β β β β
β ββββββββββ β ββββββββββ
β β β β
βΌ βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GENERATION LAYER β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β scripts/generate_examples_from_configs.py β β
β β scripts/sync_to_runtime_systems.py β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β AUTO-GENERATES: β
β β’ Training examples (JSONL) β
β β’ Database inserts (SQL) β
β β’ Vector embeddings (LangMem) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β β
β β β β
βΌ βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RUNTIME LAYER β
β (Queryable, cached, optimized for retrieval) β
β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β βPostgreSQLβ β LangMem β β Redis β β Supabaseβ β
β β (JSONB) β β (Vector)β β (Cache) β β (Sync) β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β β
β β β β
βΌ βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β APPLICATION LAYER β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Prompts β β Knowledge β β Storage β β
β β Builders β β Retrieval β β Access β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β
β USED BY: β
β β’ FastAPI endpoints β
β β’ LangGraph workflows β
β β’ MCP servers β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββπ― Key Decisions & Rationale
Decision 1: Why Move Configs to SOURCE_OF_TRUTH?
Problem: tier_presets.v1.json and scoring_model.v1.json were buried in output-styles/config/business/, making them hard to discover.
Solution:
- Move to top-level
SOURCE_OF_TRUTH/config/ - Make it clear these are canonical business rules, not pipeline artifacts
- Enable versioning and change tracking via Git
Decision 2: Why Separate Examples from Config?
Problem: Examples were scattered across pipeline stages, making them difficult to use for RAG/training.
Solution:
- Store source examples in
SOURCE_OF_TRUTH/examples/ - Store generated outputs in
output-styles/onboarding/{stage}/examples/ - Use
generate/scripts to create training examples from configs
Decision 3: Why Keep Prompts Separate?
Problem: Prompt components, templates, and builders were in different places (database/prompts/, code modules).
Solution:
- Static templates β
SOURCE_OF_TRUTH/prompts/templates/ - Prompt builders (code) β
database/prompts/builders/ - Dynamic assembly β
database/knowledge/templates/
Decision 4: How Do We Avoid Duplication?
Strategy: Single Source of Truth + Generation Scripts
# One config file generates many artifacts:
tier_presets.v1.json (SOURCE_OF_TRUTH)
β
βββ Training examples (JSONL)
βββ PostgreSQL rows (JSONB)
βββ LangMem embeddings
βββ Redis cache entries
βββ API response templatesπ README Templates
SOURCE_OF_TRUTH/README.md
# Source of Truth
This directory contains all **canonical, version-controlled data** for the AltSports system.
## Philosophy
**Single Source of Truth**: All derived data (database records, embeddings, cache entries) is GENERATED from files here.
## Structure
- **`schemas/`**: JSON Schema definitions (already exists)
- **`config/`**: Business configuration files (pricing, scoring, sports)
- **`prompts/`**: Static prompt templates (Jinja2/Mustache)
- **`examples/`**: Training and reference examples (JSONL format)
## Usage
1. **Edit files here** (version-controlled)
2. **Run generation scripts** to sync to runtime systems:
```bash
python database/scripts/generate_examples_from_configs.py
python database/scripts/sync_to_runtime_systems.py- Verify sync in PostgreSQL, LangMem, Redis
β οΈ Important
- Never edit data in runtime systems directly
- Always update SOURCE_OF_TRUTH first
- Always run sync scripts after changes
### SOURCE_OF_TRUTH/config/README.md
```markdown
# Business Configuration Files
Canonical configuration for pricing, scoring, and sports logic.
## Files
### Pricing
- **`tier_presets.v1.json`**: Pricing tiers, SLAs, contract templates
- **`combat.pricing.v1.json`**: Combat sports vertical pricing
### Scoring
- **`scoring_model.v1.json`**: Scoring weights, modifiers, thresholds
### Sports
- **`archetypes.json`**: Sport classification rules
- **`betting_markets.json`**: Market definitions by sport
## Generation
These configs auto-generate:
- Training examples β `SOURCE_OF_TRUTH/examples/`
- Database records β PostgreSQL `business_config` table
- Vector embeddings β LangMem `business-rules` namespace
Run:
```bash
python database/scripts/sync_to_runtime_systems.py
### SOURCE_OF_TRUTH/examples/README.md
```markdown
# Training & Reference Examples
JSONL-formatted examples for LLM training, RAG, and testing.
## Format
All examples follow this structure:
```json
{
"input": "User query or task description",
"output": "Expected response or result",
"metadata": {
"type": "example_type",
"source": "originating_config_file",
"version": 1
}
}Categories
onboarding/: Questionnaire processing examplessports_classification/: Sport archetype examplescontract_generation/: Contract assembly examples
Auto-Generated Examples
Files ending in generated_from_config.jsonl are AUTO-GENERATED:
python database/scripts/generate_examples_from_configs.pyDo not edit these manually. Edit the source config instead.
---
## π Implementation Checklist
### Week 1: Foundation
- [ ] Create `SOURCE_OF_TRUTH/` directory structure
- [ ] Write `SOURCE_OF_TRUTH/README.md`
- [ ] Copy (don't move) config files from `output-styles/config/`
- [ ] Create `database/scripts/generate_examples_from_configs.py`
- [ ] Test example generation locally
### Week 2: Sync Infrastructure
- [ ] Create PostgreSQL `business_config` table
- [ ] Write `database/scripts/sync_to_runtime_systems.py`
- [ ] Test PostgreSQL sync
- [ ] Test LangMem sync
- [ ] Add Redis caching layer
### Week 3: Prompt Organization
- [ ] Move static templates to `SOURCE_OF_TRUTH/prompts/templates/`
- [ ] Create `database/prompts/builders/` for code modules
- [ ] Update imports in existing code
- [ ] Test prompt building
### Week 4: Integration & Testing
- [ ] Update FastAPI endpoints to use new paths
- [ ] Update LangGraph workflows
- [ ] Update MCP servers
- [ ] Run full system test
- [ ] Document migration in CHANGELOG
### Week 5: Cleanup
- [ ] Remove old `output-styles/config/` (after verification)
- [ ] Update all README files
- [ ] Create developer documentation
- [ ] Train team on new structure
---
## π Developer Guide
### How to Add a New Business Config
1. **Create config in SOURCE_OF_TRUTH**:
```bash
vim database/SOURCE_OF_TRUTH/config/business/my_new_config.v1.json-
Update generation script:
# database/scripts/generate_examples_from_configs.py def generate_my_new_examples(): # Add logic here pass -
Run sync:
python database/scripts/sync_to_runtime_systems.py -
Verify:
SELECT * FROM business_config WHERE config_type = 'my_new_config';
How to Use Examples in RAG
from langmem import LangMemClient
# Query examples
client = LangMemClient(namespace="business-rules")
results = client.query(
query="What tier for a combat league?",
filters={"type": "tier_recommendation"}
)
for result in results:
print(f"Example: {result.content}")
print(f"Metadata: {result.metadata}")How to Build Prompts
from database.prompts.builders import onboarding_prompts
# Use prompt builder
prompt = onboarding_prompts.build_tier_classification_prompt(
league_data=league_json,
config=tier_presets,
examples=langmem_examples
)π Success Metrics
After implementation, we should achieve:
- Discoverability: Developers find config files in < 30 seconds
- Consistency: Zero manual config updates to runtime systems
- Versioning: 100% of config changes tracked in Git
- RAG Quality: 30% improvement in tier recommendation accuracy
- Maintainability: 50% reduction in "where is X?" questions
π Related Documentation
Last Updated: 2025-01-16 Next Review: After Phase 2 completion