Source: data_layer/docs/COMPLETE_SYSTEM_ARCHITECTURE.md

Prompt Intelligence System - Complete Architecture

🎯 The Complete Picture

A self-improving prompt intelligence system that:

Seeds from static files (database/prompts/)
Caches in LangMem (InMemoryStore)
Persists to databases (Firebase + Supabase)
Improves over time (update → re-cache → re-seed)
Serves via REST API + MCP server
Orchestrates multi-agent workflows

🔄 Complete Data Flow Cycle

┌────────────────────────────────────────────────────────────┐
│ SEED PHASE (Startup)                                       │
└────────────────────────────────────────────────────────────┘

database/prompts/*.md (135 files) - SEEDS
database/output-styles/ (800+ examples) - SEEDS
    ↓ Read and parse
Build workflows with examples + schemas
    ↓ Seed phase
┌─────────────┬──────────────┬────────────────┐
↓             ↓              ↓                ↓
Supabase      Firebase       InMemoryStore    ChromaDB
(League data) (User data)    (Fast cache)     (Optional)
• Examples    • Workflows    • All prompts    • Vector search
• Analytics   • Preferences  • &lt;1ms access    backup

┌────────────────────────────────────────────────────────────┐
│ QUERY PHASE (Runtime)                                      │
└────────────────────────────────────────────────────────────┘

User/AI queries for prompt
    ↓
Check InMemoryStore (&lt;1ms)
    ├─ HIT: Return immediately ⚡
    └─ MISS: Check databases
        ├─ Check Supabase (league data) ~20ms
        ├─ Check Firebase (user/workflow) ~20ms
        └─ Fallback to files ~9ms
    ↓
Cache result in InMemoryStore
    ↓
Future queries: &lt;1ms ⚡

┌────────────────────────────────────────────────────────────┐
│ IMPROVEMENT PHASE (Continuous)                             │
└────────────────────────────────────────────────────────────┘

User provides feedback/suggestions
    ↓
POST /api/prompts/update
{"suggestions": ["Add NBA examples", "Improve pricing"]}
    ↓
Get current version from InMemoryStore
    ↓
Apply improvements (increment version)
    ↓
Update InMemoryStore (immediate)
    ↓ Background async
┌─────────────┬──────────────┐
↓             ↓              ↓
Firebase      Supabase       Re-seed flag
(Updated)     (Analytics)    (For rebuild)
    ↓             ↓              ↓
Next query    Performance    Next startup
uses v2       tracking       uses v2

🏗️ Five-Layer Architecture

Layer 0: SEED FILES (database/)

Purpose: Source of truth, Git versioned, human-editable

database/
├── prompts/
│   ├── agents/*.md        # 28 agent seed prompts
│   ├── commands/*.md      # 26 command seed prompts
│   └── workflows/*.md     # Workflow seed prompts
│
└── output-styles/
    └── league_questionnaire_to_contract/
        ├── stage_2/ (examples, schema) - 562 examples!
        ├── stage_3/ (examples, schema) - 9 examples
        ├── stage_4/ (examples, schema) - 3 examples
        ├── stage_5/ (examples, schema) - 9 examples
        ├── stage_6/ (examples, schema) - 55 examples
        ├── stage_7a/ (examples, schema) - 153 examples
        └── stage_7b/ (examples, schema) - 22 examples

TOTAL: 135 prompt seeds, 800+ example seeds

Layer 1: DATABASES (Seeded from files)

Purpose: Persistent storage, seeded at startup, updated on improvements

Supabase (League-specific data):

-- Seeded from database/output-styles/
CREATE TABLE league_examples (
  sport TEXT,
  tier TEXT,
  stage TEXT,
  example_data JSONB,
  source_file TEXT  -- Where it was seeded from
);
 
-- Populated with 800+ examples
INSERT INTO league_examples 
  SELECT examples FROM database/output-styles/*/examples/*.json;

Firebase (User + workflow data):

// Seeded from database/prompts/
prompts/
  workflows/
    questionnaire_to_contract: {
      stages: [...],
      seeded_from: "database/output-styles/",
      version: 1
    }
  
  catalog/
    all_prompts: [...135 prompts...],
    seeded_at: timestamp

Layer 2: LANGMEM CACHE (InMemoryStore)

Purpose: Fast in-memory cache, populated from DBs, <1ms retrieval

# Populated at startup from databases
store = InMemoryStore(index={"embed": "openai:text-embedding-3-small"})
 
# Namespaces
("workflows", "questionnaire_to_contract") → Full 9-stage workflow
("prompts", "pdf_agent") → Individual agent prompts
("examples", "premium_basketball") → Filtered examples
("schemas", "ExtractedData") → Validation schemas

Data source priority:

If in InMemoryStore → return (<1ms)
If in Firebase/Supabase → cache and return (~20ms)
If only in files → build, seed DBs, cache, return (~9ms)

Layer 3: SERVICES (apps/backend/services/)

Purpose: Business logic, workflow orchestration, agent creation

services/
├── prompts.py              # Workflow execution
│   • Get from store
│   • Build LangGraph
│   • Execute pipeline
│   • Track analytics
│
├── orchestrator.py         # Agent creation
│   • Parse workflow stages
│   • Create agents with tools
│   • Coordinate execution
│
└── agent_communication.py  # Agent-to-agent
    • Message routing
    • Context sharing
    • Workflow coordination

Layer 4: API (apps/backend/api/)

Purpose: REST endpoints for humans + frontends

api/prompts.py:
  GET  /api/prompts/catalog      # List all seeds
  POST /api/prompts/search       # Semantic search
  POST /api/prompts/update       # Improve (re-seed)
  POST /api/prompts/execute      # Run workflow
  POST /api/prompts/batch        # Parallel execution

Layer 5: MCP SERVER (apps/backend/mcp/)

Purpose: AI-to-AI communication, self-discovery

mcp/prompt_server.py:
  GET  /mcp/discover            # What can you do?
  POST /mcp/rpc                 # Execute via RPC
  POST /mcp/agent/communicate   # Agent-to-agent

🌱 Seeding Strategy (Complete)

Phase 1: Initial Seed (Deployment)

# database/scripts/build.py (ENHANCED)
 
1. Read seed files:
   ├─ database/prompts/agents/*.md (28 seeds)
   ├─ database/prompts/commands/*.md (26 seeds)
   └─ database/output-styles/*/examples/*.json (800+ seeds)
 
2. Seed Supabase (league-specific):
   ├─ league_examples (800+ rows)
   │   GROUP BY sport (basketball, soccer, hockey...)
   │   GROUP BY tier (premium, professional, standard)
   │   GROUP BY stage (stage_2 through stage_7)
   │
   ├─ workflow_definitions (workflows metadata)
   └─ prompt_catalog (all 135 prompts indexed)
 
3. Seed Firebase (user-specific + workflows):
   ├─ prompts/workflows (critical workflows)
   ├─ prompts/catalog (all 135 prompts)
   └─ system/seed_info (when seeded, from what version)
 
4. Populate InMemoryStore (fast cache):
   └─ All workflows, prompts, examples in memory

Phase 2: Query Phase (Runtime)

# When query arrives
async def get_prompt(name: str):
    # 1. Check InMemoryStore (cache)
    result = store.get(("workflows",), name)
    if result:
        return result.value  # &lt;1ms ⚡
    
    # 2. Check Supabase (seeded league data)
    db_result = await supabase.table("workflow_definitions").select("*").eq("name", name).single()
    if db_result:
        store.put(("workflows",), name, db_result.data)  # Cache it
        return db_result.data  # ~20ms
    
    # 3. Check Firebase (seeded user/workflow data)
    fb_result = await firebase.get(f"prompts/workflows/{name}")
    if fb_result:
        store.put(("workflows",), name, fb_result)  # Cache it
        return fb_result  # ~20ms
    
    # 4. Build from seed files (fallback)
    file_result = build_from_database_files(name)
    store.put(("workflows",), name, file_result)  # Cache it
    await seed_to_databases(file_result)  # Seed for next time
    return file_result  # ~9ms

Phase 3: Improvement Phase (Continuous)

# When user improves prompt
async def update_prompt(name: str, suggestions: List[str]):
    # 1. Get current from InMemoryStore
    current = store.get(("workflows",), name).value
    
    # 2. Apply improvements
    improved = {
        **current,
        "suggestions_applied": suggestions,
        "version": current["version"] + 1,
        "updated_at": datetime.now()
    }
    
    # 3. Update InMemoryStore (immediate)
    store.put(("workflows",), name, improved)
    
    # 4. Re-seed databases (background)
    await supabase.upsert("workflow_definitions", improved)  # Update seed
    await firebase.set(f"prompts/workflows/{name}", improved)  # Update seed
    
    # 5. Next query gets improved version!
    # 6. On container restart, loads improved version from DB

🤖 Agent Orchestration Flow

Example: Execute "questionnaire_to_contract"

# User/AI request
POST /api/prompts/execute
{
  "workflow": "questionnaire_to_contract",
  "input_data": {"questionnaire_text": "..."},
  "use_orchestrator": true  # Create agents before execution
}
 
↓ (apps/backend/api/prompts.py)
 
if use_orchestrator:
    # Use orchestrator service
    result = await orchestrator.orchestrate_workflow(workflow, input_data)
else:
    # Direct execution
    result = await service.execute_workflow(workflow, input_data)
 
↓ (apps/backend/services/orchestrator.py)
 
1. Get workflow from store (&lt;1ms from InMemoryStore/DB seeds)
   └─ Returns: 9 stages with prompts + examples + schemas
 
2. Create specialized agents (one per stage):
   ├─ Agent 1: "ExtractAgent"
   │   ├─ Prompt: stage_2 prompt (from seeds)
   │   ├─ Tools: [pdf_processor, ocr_tool]
   │   └─ Examples: 562 extraction examples (from seeds)
   │
   ├─ Agent 2: "EnhanceAgent"
   │   ├─ Prompt: stage_3 prompt (from seeds)
   │   ├─ Tools: [data_enricher, api_fetcher]
   │   └─ Examples: 9 enhancement examples (from seeds)
   │
   ├─ ... (7 more agents for stages 4-7)
   │
   └─ Agent 9: "ExportAgent"
       ├─ Prompt: stage_7b prompt (from seeds)
       ├─ Tools: [markdown_formatter, pdf_generator]
       └─ Examples: 22 export examples (from seeds)
 
3. Execute workflow with agent crew:
   ├─ Agent 1 processes with its tools
   ├─ Passes output to Agent 2
   ├─ Agent 2 processes with its tools
   ├─ ... sequential or parallel
   └─ Agent 9 produces final output
 
4. Return result

🧠 Intelligence Evolution (Seed → Improve → Re-Seed)

Lifecycle:

GENERATION 1 (Initial):
─────────────────────────
Seeds: database/prompts/*.md (v1)
  ↓ Build at startup
InMemoryStore: Cached (v1)
  ↓ Seed DBs
Supabase/Firebase: Seeded (v1)
  ↓ Execute workflows
Performance: 85% success rate

USER FEEDBACK:
─────────────────────────
"NBA tier classification is inaccurate"
"Need more basketball examples"

POST /api/prompts/update
{
  "suggestions": [
    "Add 20 NBA-specific examples",
    "Improve tier logic for major leagues",
    "Include revenue thresholds"
  ]
}

GENERATION 2 (Improved):
─────────────────────────
InMemoryStore: Updated (v2) ← Immediate
  ↓ Background sync
Supabase/Firebase: Re-seeded (v2)
  ↓ Future restarts
Container restart: Loads v2 from DB
  ↓ Execute workflows
Performance: 92% success rate ← Improved!

GENERATION 3 (Further refined):
─────────────────────────
[Cycle continues...]

Key Point: Seeds Evolve Over Time

Start: Static seeds from files
Improve: Update via API
Re-seed: Databases get updated versions
Persist: Next startup loads improved seeds

🗄️ Database Seeding Detail

Supabase Seeding (League-Specific Data):

# database/scripts/build.py - Seed Supabase
 
async def seed_supabase_from_files():
    """
    Seed Supabase with league examples
    Organized by sport/tier for fast filtering
    """
    
    pipeline_path = Path("database/output-styles/league_questionnaire_to_contract")
    
    for stage_dir in pipeline_path.glob("league_questionnaire_*"):
        examples_dir = stage_dir / "examples"
        
        if examples_dir.exists():
            for example_file in examples_dir.glob("*.json"):
                example = json.loads(example_file.read_text())
                
                # Extract metadata
                sport = example.get("sport", "general")
                tier = example.get("tier", "standard")
                stage = stage_dir.name
                
                # Seed to Supabase
                await supabase.table("league_examples").upsert({
                    "sport": sport,
                    "tier": tier,
                    "stage": stage,
                    "example_data": example,
                    "source_file": str(example_file),
                    "seeded_at": datetime.now().isoformat()
                }).execute()
    
    logger.info(f"✅ Seeded Supabase with 800+ examples")

Firebase Seeding (Workflows + User Data):

# database/scripts/build.py - Seed Firebase
 
async def seed_firebase_from_files():
    """
    Seed Firebase with workflows and prompt catalog
    Critical for cross-instance sync
    """
    
    # Build workflow
    workflow = await build_workflow_from_files("questionnaire_to_contract")
    
    # Seed to Firebase
    await firebase.db.collection("prompts").document("workflows").set({
        "questionnaire_to_contract": workflow,
        "_metadata": {
            "seeded_from": "database/output-styles/",
            "seeded_at": datetime.now().isoformat(),
            "total_stages": len(workflow.get("stages", [])),
            "example_count": sum(len(s.get("examples", [])) for s in workflow["stages"])
        }
    })
    
    logger.info(f"✅ Seeded Firebase with workflows")

🔌 MCP Server Integration

Self-Discovery Endpoint:

# apps/backend/mcp/prompt_server.py
 
@mcp_router.get("/discover")
async def mcp_discover():
    """
    Self-discovery for other AI agents
    
    Returns:
    - What workflows available (from seeds)
    - What tools available
    - How to communicate
    - What data seeded
    """
    
    from stores.prompts import get_prompt_store
    
    store = get_prompt_store()
    await store.initialize()
    
    catalog = await store.get_catalog()
    
    return {
        "server": "altsportsdata_prompt_intelligence",
        "version": "1.0.0",
        "protocol": "MCP/RPC",
        
        "seeded_data": {
            "workflows": len(catalog.get("workflows", [])),
            "prompts": len(catalog.get("prompts", [])),
            "examples": 800,
            "seed_sources": [
                "database/prompts/*.md",
                "database/output-styles/*/examples/",
                "database/output-styles/*/schema/"
            ],
            "last_seeded": "2025-10-13T...",
            "databases_seeded": ["firebase", "supabase", "inmemorystore"]
        },
        
        "capabilities": {
            "search_prompts": {
                "description": "Semantic search across 135 seeded prompts",
                "method": "POST /mcp/rpc",
                "params": {
                    "method": "prompts/search",
                    "params": {"query": "string"}
                }
            },
            "execute_workflow": {
                "description": "Execute seeded 9-stage workflow",
                "available_workflows": [w["name"] for w in catalog["workflows"]],
                "method": "POST /mcp/rpc"
            },
            "agent_orchestration": {
                "description": "Create agents from seeded prompts + tools",
                "method": "POST /mcp/rpc",
                "params": {
                    "method": "agent/orchestrate",
                    "params": {"workflow": "string", "create_agents": true}
                }
            }
        },
        
        "improvement_protocol": {
            "description": "System learns and improves seed data",
            "method": "POST /api/prompts/update",
            "effect": "Updates InMemoryStore + re-seeds databases",
            "persistence": "Improved seeds persist across restarts"
        }
    }

🤝 Agent-to-Agent Communication (Using Seeded Prompts)

Scenario: Multi-Agent Workflow

# Agent A discovers what prompts are available
GET /mcp/discover
  ↓
Returns: "questionnaire_to_contract" workflow (seeded from files)
         With 9 stages, each has prompt + tools + examples
 
# Agent A requests workflow coordination
POST /mcp/rpc
{
  "method": "agent/coordinate",
  "params": {
    "workflow": "questionnaire_to_contract",
    "requesting_agent": "agent_a",
    "mode": "multi_agent_parallel"
  }
}
 
# System responds with assignments (from seeded workflow):
{
  "assignments": {
    "agent_a_stage_2": {
      "prompt": "...extraction prompt from seeds...",
      "tools": ["pdf_processor"],
      "examples": [...562 seeded examples...],
      "next_agent": "agent_a_stage_3"
    },
    "agent_a_stage_3": {
      "prompt": "...enhancement prompt from seeds...",
      "tools": ["enricher"],
      "examples": [...9 seeded examples...],
      "next_agent": "agent_a_stage_4"
    },
    // ... all 9 stages assigned
  }
}
 
# Agents execute in parallel/sequence using seeded prompts
# Each agent has:
# - Prompt from seeds
# - Examples from seeds  
# - Schema from seeds
# - Tools assigned by orchestrator

📊 Seeding Tables (Supabase Schema)

-- League examples (seeded from database/output-styles/)
CREATE TABLE league_examples (
  id SERIAL PRIMARY KEY,
  sport TEXT NOT NULL,              -- basketball, soccer, etc.
  tier TEXT,                        -- premium, professional, standard
  stage TEXT,                       -- stage_2, stage_3, etc.
  example_data JSONB,               -- Complete example
  source_file TEXT,                 -- Where seeded from
  seeded_at TIMESTAMP DEFAULT NOW(),
  version INT DEFAULT 1
);
 
-- Workflow definitions (seeded from database/output-styles/)
CREATE TABLE workflow_definitions (
  workflow_name TEXT PRIMARY KEY,
  total_stages INT,
  stages JSONB,                     -- All stage configs
  source_path TEXT,                 -- database/output-styles/...
  seeded_at TIMESTAMP,
  version INT DEFAULT 1
);
 
-- Prompt catalog (seeded from database/prompts/)
CREATE TABLE prompt_catalog (
  prompt_type TEXT,
  prompt_name TEXT,
  prompt_content TEXT,
  source_file TEXT,                 -- database/prompts/agents/...
  metadata JSONB,
  seeded_at TIMESTAMP,
  version INT DEFAULT 1,
  PRIMARY KEY (prompt_type, prompt_name)
);
 
-- Usage tracking (populated during runtime)
CREATE TABLE prompt_usage (
  id SERIAL PRIMARY KEY,
  prompt_name TEXT,
  used_by_agent TEXT,
  success BOOLEAN,
  execution_time FLOAT,
  used_at TIMESTAMP DEFAULT NOW()
);

🔄 Complete System Flow Example

Scenario: NBA Questionnaire Processing

Step 1: AI Agent Discovers Server

GET /mcp/discover

Response:

{
  "seeded_data": {
    "workflows": 1,
    "examples": 800,
    "databases_seeded": ["firebase", "supabase", "inmemorystore"]
  },
  "capabilities": {
    "execute_workflow": {
      "available_workflows": ["questionnaire_to_contract"]
    }
  }
}

Step 2: AI Agent Requests Workflow via MCP RPC

POST /mcp/rpc
{
  "method": "prompts/execute",
  "params": {
    "workflow": "questionnaire_to_contract",
    "input_data": {"file": "nba_questionnaire.pdf"},
    "create_agents": true
  }
}

Step 3: System Retrieves Workflow (From Seeds)

# Check InMemoryStore first
workflow = store.get(("workflows",), "questionnaire_to_contract")
# → HIT: Loaded from DB seeds at startup (&lt;1ms)
 
# Workflow contains (all from seed files):
{
  "stages": [
    {
      "name": "stage_2_extraction",
      "prompt": "... (from database/output-styles/stage_2/README.md)",
      "examples": [...562 examples from stage_2/examples/*.json...],
      "schema": {...from stage_2/schema/*.json...}
    },
    // ... 8 more stages (all from seeds)
  ]
}

Step 4: Orchestrator Creates Agents (With Seeded Data)

# services/orchestrator.py
 
agents = []
for stage in workflow["stages"]:
    agent = create_agent(
        prompt=stage["prompt"],        # From seeds
        tools=assign_tools(stage),     # From tool registry
        examples=stage["examples"],    # From seeds
        schema=stage["schema"]         # From seeds
    )
    agents.append(agent)
 
# Returns 9 agents, each with:
# - Seeded prompt
# - Seeded examples
# - Seeded schema
# - Assigned tools

Step 5: Execute with Agent Crew

# Each agent executes its stage
for i, agent in enumerate(agents):
    state = await agent.execute(
        input=state,
        prompt=agent.prompt,      # From seeds
        examples=agent.examples,  # From seeds
        tools=agent.tools
    )
 
# Returns final contract

Step 6: System Learns (Re-Seeds)

# If user provides feedback
POST /api/prompts/update
{
  "suggestions": ["NBA pricing too low", "Add luxury tax"]
}
 
# System:
# 1. Updates InMemoryStore (v1 → v2)
# 2. Re-seeds Supabase (updated examples)
# 3. Re-seeds Firebase (updated workflow)
# 4. Next execution uses v2 seeds!

🎯 Final Architecture Benefits

✅ Seeded Intelligence

Databases pre-populated from 135 prompts + 800 examples
Fast queries (data already in DBs)
Consistent across restarts

✅ Cached Performance

InMemoryStore: <1ms retrieval
DB seeds: ~20ms if cache miss
File fallback: ~9ms worst case

✅ Self-Improving

Update prompts via API
Re-seeds databases automatically
Next startup loads improved seeds

✅ Multi-Interface

REST API (humans/frontends)
MCP server (AI-to-AI)
Agent orchestration (multi-agent)

✅ Production Reliable

Seeds persist in databases
Multi-tier fallback
Graceful degradation
Cross-instance sync

📋 Complete File Checklist

✅ Core System (Implemented):

apps/backend/stores/prompts.py - InMemoryStore + DB sync
apps/backend/services/prompts.py - Workflow execution
apps/backend/api/prompts.py - REST API
apps/backend/server.py - Integration
database/scripts/build.py - Seeding script
database/scripts/validate.py - Tests (7/7 passing)

🔄 Enhancements (Next Phase):

Enhanced seeding in build.py (Supabase + Firebase)
apps/backend/mcp/prompt_server.py - MCP server
apps/backend/services/orchestrator.py - Agent creation
apps/backend/services/agent_communication.py - Agent protocol

🚀 Current Status

What Works NOW:

✅ InMemoryStore caching (<1ms)
✅ Build from files (seeding)
✅ REST API (5 endpoints)
✅ Workflow execution
✅ Update with suggestions
✅ Database sync (Firebase + Supabase)

Ready to Add:

🔄 Enhanced DB seeding (Supabase examples, Firebase workflows)
🤖 MCP server (AI-to-AI communication)
🎭 Agent orchestration (create agents before execution)
💬 Agent-to-agent protocol (workflow coordination)

🎓 Key Insight

This system treats prompts as living, evolving intelligence:

Born from seed files (database/prompts/)
Cached in memory (InMemoryStore) for speed
Persisted in databases (seeded for reliability)
Improved via feedback (re-seeded automatically)
Shared via MCP (AI-to-AI collaboration)
Orchestrated for complex workflows (multi-agent)

Result: A self-improving, multi-interface prompt intelligence system that gets better over time! 🎯

See: database/COMPLETE_ARCHITECTURE.md (this file) for full picture. Start: START_HERE_PROMPT_INTELLIGENCE.md for quick setup. Deploy: System ready for Google Cloud Run with seeded databases.

Prompt Intelligence System - Complete Architecture Prompt Intelligence System - Deployment Architecture