Source: data_layer/docs/COMPLETE_SYSTEM_ARCHITECTURE.md
Prompt Intelligence System - Complete Architecture
π― The Complete Picture
A self-improving prompt intelligence system that:
- Seeds from static files (
database/prompts/) - Caches in LangMem (InMemoryStore)
- Persists to databases (Firebase + Supabase)
- Improves over time (update β re-cache β re-seed)
- Serves via REST API + MCP server
- Orchestrates multi-agent workflows
π Complete Data Flow Cycle
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SEED PHASE (Startup) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
database/prompts/*.md (135 files) - SEEDS
database/output-styles/ (800+ examples) - SEEDS
β Read and parse
Build workflows with examples + schemas
β Seed phase
βββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββ
β β β β
Supabase Firebase InMemoryStore ChromaDB
(League data) (User data) (Fast cache) (Optional)
β’ Examples β’ Workflows β’ All prompts β’ Vector search
β’ Analytics β’ Preferences β’ <1ms access backup
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUERY PHASE (Runtime) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
User/AI queries for prompt
β
Check InMemoryStore (<1ms)
ββ HIT: Return immediately β‘
ββ MISS: Check databases
ββ Check Supabase (league data) ~20ms
ββ Check Firebase (user/workflow) ~20ms
ββ Fallback to files ~9ms
β
Cache result in InMemoryStore
β
Future queries: <1ms β‘
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β IMPROVEMENT PHASE (Continuous) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
User provides feedback/suggestions
β
POST /api/prompts/update
{"suggestions": ["Add NBA examples", "Improve pricing"]}
β
Get current version from InMemoryStore
β
Apply improvements (increment version)
β
Update InMemoryStore (immediate)
β Background async
βββββββββββββββ¬βββββββββββββββ
β β β
Firebase Supabase Re-seed flag
(Updated) (Analytics) (For rebuild)
β β β
Next query Performance Next startup
uses v2 tracking uses v2ποΈ Five-Layer Architecture
Layer 0: SEED FILES (database/)
Purpose: Source of truth, Git versioned, human-editable
database/
βββ prompts/
β βββ agents/*.md # 28 agent seed prompts
β βββ commands/*.md # 26 command seed prompts
β βββ workflows/*.md # Workflow seed prompts
β
βββ output-styles/
βββ league_questionnaire_to_contract/
βββ stage_2/ (examples, schema) - 562 examples!
βββ stage_3/ (examples, schema) - 9 examples
βββ stage_4/ (examples, schema) - 3 examples
βββ stage_5/ (examples, schema) - 9 examples
βββ stage_6/ (examples, schema) - 55 examples
βββ stage_7a/ (examples, schema) - 153 examples
βββ stage_7b/ (examples, schema) - 22 examples
TOTAL: 135 prompt seeds, 800+ example seedsLayer 1: DATABASES (Seeded from files)
Purpose: Persistent storage, seeded at startup, updated on improvements
Supabase (League-specific data):
-- Seeded from database/output-styles/
CREATE TABLE league_examples (
sport TEXT,
tier TEXT,
stage TEXT,
example_data JSONB,
source_file TEXT -- Where it was seeded from
);
-- Populated with 800+ examples
INSERT INTO league_examples
SELECT examples FROM database/output-styles/*/examples/*.json;Firebase (User + workflow data):
// Seeded from database/prompts/
prompts/
workflows/
questionnaire_to_contract: {
stages: [...],
seeded_from: "database/output-styles/",
version: 1
}
catalog/
all_prompts: [...135 prompts...],
seeded_at: timestampLayer 2: LANGMEM CACHE (InMemoryStore)
Purpose: Fast in-memory cache, populated from DBs, <1ms retrieval
# Populated at startup from databases
store = InMemoryStore(index={"embed": "openai:text-embedding-3-small"})
# Namespaces
("workflows", "questionnaire_to_contract") β Full 9-stage workflow
("prompts", "pdf_agent") β Individual agent prompts
("examples", "premium_basketball") β Filtered examples
("schemas", "ExtractedData") β Validation schemasData source priority:
- If in InMemoryStore β return (<1ms)
- If in Firebase/Supabase β cache and return (~20ms)
- If only in files β build, seed DBs, cache, return (~9ms)
Layer 3: SERVICES (apps/backend/services/)
Purpose: Business logic, workflow orchestration, agent creation
services/
βββ prompts.py # Workflow execution
β β’ Get from store
β β’ Build LangGraph
β β’ Execute pipeline
β β’ Track analytics
β
βββ orchestrator.py # Agent creation
β β’ Parse workflow stages
β β’ Create agents with tools
β β’ Coordinate execution
β
βββ agent_communication.py # Agent-to-agent
β’ Message routing
β’ Context sharing
β’ Workflow coordinationLayer 4: API (apps/backend/api/)
Purpose: REST endpoints for humans + frontends
api/prompts.py:
GET /api/prompts/catalog # List all seeds
POST /api/prompts/search # Semantic search
POST /api/prompts/update # Improve (re-seed)
POST /api/prompts/execute # Run workflow
POST /api/prompts/batch # Parallel executionLayer 5: MCP SERVER (apps/backend/mcp/)
Purpose: AI-to-AI communication, self-discovery
mcp/prompt_server.py:
GET /mcp/discover # What can you do?
POST /mcp/rpc # Execute via RPC
POST /mcp/agent/communicate # Agent-to-agentπ± Seeding Strategy (Complete)
Phase 1: Initial Seed (Deployment)
# database/scripts/build.py (ENHANCED)
1. Read seed files:
ββ database/prompts/agents/*.md (28 seeds)
ββ database/prompts/commands/*.md (26 seeds)
ββ database/output-styles/*/examples/*.json (800+ seeds)
2. Seed Supabase (league-specific):
ββ league_examples (800+ rows)
β GROUP BY sport (basketball, soccer, hockey...)
β GROUP BY tier (premium, professional, standard)
β GROUP BY stage (stage_2 through stage_7)
β
ββ workflow_definitions (workflows metadata)
ββ prompt_catalog (all 135 prompts indexed)
3. Seed Firebase (user-specific + workflows):
ββ prompts/workflows (critical workflows)
ββ prompts/catalog (all 135 prompts)
ββ system/seed_info (when seeded, from what version)
4. Populate InMemoryStore (fast cache):
ββ All workflows, prompts, examples in memoryPhase 2: Query Phase (Runtime)
# When query arrives
async def get_prompt(name: str):
# 1. Check InMemoryStore (cache)
result = store.get(("workflows",), name)
if result:
return result.value # <1ms β‘
# 2. Check Supabase (seeded league data)
db_result = await supabase.table("workflow_definitions").select("*").eq("name", name).single()
if db_result:
store.put(("workflows",), name, db_result.data) # Cache it
return db_result.data # ~20ms
# 3. Check Firebase (seeded user/workflow data)
fb_result = await firebase.get(f"prompts/workflows/{name}")
if fb_result:
store.put(("workflows",), name, fb_result) # Cache it
return fb_result # ~20ms
# 4. Build from seed files (fallback)
file_result = build_from_database_files(name)
store.put(("workflows",), name, file_result) # Cache it
await seed_to_databases(file_result) # Seed for next time
return file_result # ~9msPhase 3: Improvement Phase (Continuous)
# When user improves prompt
async def update_prompt(name: str, suggestions: List[str]):
# 1. Get current from InMemoryStore
current = store.get(("workflows",), name).value
# 2. Apply improvements
improved = {
**current,
"suggestions_applied": suggestions,
"version": current["version"] + 1,
"updated_at": datetime.now()
}
# 3. Update InMemoryStore (immediate)
store.put(("workflows",), name, improved)
# 4. Re-seed databases (background)
await supabase.upsert("workflow_definitions", improved) # Update seed
await firebase.set(f"prompts/workflows/{name}", improved) # Update seed
# 5. Next query gets improved version!
# 6. On container restart, loads improved version from DBπ€ Agent Orchestration Flow
Example: Execute "questionnaire_to_contract"
# User/AI request
POST /api/prompts/execute
{
"workflow": "questionnaire_to_contract",
"input_data": {"questionnaire_text": "..."},
"use_orchestrator": true # Create agents before execution
}
β (apps/backend/api/prompts.py)
if use_orchestrator:
# Use orchestrator service
result = await orchestrator.orchestrate_workflow(workflow, input_data)
else:
# Direct execution
result = await service.execute_workflow(workflow, input_data)
β (apps/backend/services/orchestrator.py)
1. Get workflow from store (<1ms from InMemoryStore/DB seeds)
ββ Returns: 9 stages with prompts + examples + schemas
2. Create specialized agents (one per stage):
ββ Agent 1: "ExtractAgent"
β ββ Prompt: stage_2 prompt (from seeds)
β ββ Tools: [pdf_processor, ocr_tool]
β ββ Examples: 562 extraction examples (from seeds)
β
ββ Agent 2: "EnhanceAgent"
β ββ Prompt: stage_3 prompt (from seeds)
β ββ Tools: [data_enricher, api_fetcher]
β ββ Examples: 9 enhancement examples (from seeds)
β
ββ ... (7 more agents for stages 4-7)
β
ββ Agent 9: "ExportAgent"
ββ Prompt: stage_7b prompt (from seeds)
ββ Tools: [markdown_formatter, pdf_generator]
ββ Examples: 22 export examples (from seeds)
3. Execute workflow with agent crew:
ββ Agent 1 processes with its tools
ββ Passes output to Agent 2
ββ Agent 2 processes with its tools
ββ ... sequential or parallel
ββ Agent 9 produces final output
4. Return resultπ§ Intelligence Evolution (Seed β Improve β Re-Seed)
Lifecycle:
GENERATION 1 (Initial):
βββββββββββββββββββββββββ
Seeds: database/prompts/*.md (v1)
β Build at startup
InMemoryStore: Cached (v1)
β Seed DBs
Supabase/Firebase: Seeded (v1)
β Execute workflows
Performance: 85% success rate
USER FEEDBACK:
βββββββββββββββββββββββββ
"NBA tier classification is inaccurate"
"Need more basketball examples"
POST /api/prompts/update
{
"suggestions": [
"Add 20 NBA-specific examples",
"Improve tier logic for major leagues",
"Include revenue thresholds"
]
}
GENERATION 2 (Improved):
βββββββββββββββββββββββββ
InMemoryStore: Updated (v2) β Immediate
β Background sync
Supabase/Firebase: Re-seeded (v2)
β Future restarts
Container restart: Loads v2 from DB
β Execute workflows
Performance: 92% success rate β Improved!
GENERATION 3 (Further refined):
βββββββββββββββββββββββββ
[Cycle continues...]Key Point: Seeds Evolve Over Time
- Start: Static seeds from files
- Improve: Update via API
- Re-seed: Databases get updated versions
- Persist: Next startup loads improved seeds
ποΈ Database Seeding Detail
Supabase Seeding (League-Specific Data):
# database/scripts/build.py - Seed Supabase
async def seed_supabase_from_files():
"""
Seed Supabase with league examples
Organized by sport/tier for fast filtering
"""
pipeline_path = Path("database/output-styles/league_questionnaire_to_contract")
for stage_dir in pipeline_path.glob("league_questionnaire_*"):
examples_dir = stage_dir / "examples"
if examples_dir.exists():
for example_file in examples_dir.glob("*.json"):
example = json.loads(example_file.read_text())
# Extract metadata
sport = example.get("sport", "general")
tier = example.get("tier", "standard")
stage = stage_dir.name
# Seed to Supabase
await supabase.table("league_examples").upsert({
"sport": sport,
"tier": tier,
"stage": stage,
"example_data": example,
"source_file": str(example_file),
"seeded_at": datetime.now().isoformat()
}).execute()
logger.info(f"β
Seeded Supabase with 800+ examples")Firebase Seeding (Workflows + User Data):
# database/scripts/build.py - Seed Firebase
async def seed_firebase_from_files():
"""
Seed Firebase with workflows and prompt catalog
Critical for cross-instance sync
"""
# Build workflow
workflow = await build_workflow_from_files("questionnaire_to_contract")
# Seed to Firebase
await firebase.db.collection("prompts").document("workflows").set({
"questionnaire_to_contract": workflow,
"_metadata": {
"seeded_from": "database/output-styles/",
"seeded_at": datetime.now().isoformat(),
"total_stages": len(workflow.get("stages", [])),
"example_count": sum(len(s.get("examples", [])) for s in workflow["stages"])
}
})
logger.info(f"β
Seeded Firebase with workflows")π MCP Server Integration
Self-Discovery Endpoint:
# apps/backend/mcp/prompt_server.py
@mcp_router.get("/discover")
async def mcp_discover():
"""
Self-discovery for other AI agents
Returns:
- What workflows available (from seeds)
- What tools available
- How to communicate
- What data seeded
"""
from stores.prompts import get_prompt_store
store = get_prompt_store()
await store.initialize()
catalog = await store.get_catalog()
return {
"server": "altsportsdata_prompt_intelligence",
"version": "1.0.0",
"protocol": "MCP/RPC",
"seeded_data": {
"workflows": len(catalog.get("workflows", [])),
"prompts": len(catalog.get("prompts", [])),
"examples": 800,
"seed_sources": [
"database/prompts/*.md",
"database/output-styles/*/examples/",
"database/output-styles/*/schema/"
],
"last_seeded": "2025-10-13T...",
"databases_seeded": ["firebase", "supabase", "inmemorystore"]
},
"capabilities": {
"search_prompts": {
"description": "Semantic search across 135 seeded prompts",
"method": "POST /mcp/rpc",
"params": {
"method": "prompts/search",
"params": {"query": "string"}
}
},
"execute_workflow": {
"description": "Execute seeded 9-stage workflow",
"available_workflows": [w["name"] for w in catalog["workflows"]],
"method": "POST /mcp/rpc"
},
"agent_orchestration": {
"description": "Create agents from seeded prompts + tools",
"method": "POST /mcp/rpc",
"params": {
"method": "agent/orchestrate",
"params": {"workflow": "string", "create_agents": true}
}
}
},
"improvement_protocol": {
"description": "System learns and improves seed data",
"method": "POST /api/prompts/update",
"effect": "Updates InMemoryStore + re-seeds databases",
"persistence": "Improved seeds persist across restarts"
}
}π€ Agent-to-Agent Communication (Using Seeded Prompts)
Scenario: Multi-Agent Workflow
# Agent A discovers what prompts are available
GET /mcp/discover
β
Returns: "questionnaire_to_contract" workflow (seeded from files)
With 9 stages, each has prompt + tools + examples
# Agent A requests workflow coordination
POST /mcp/rpc
{
"method": "agent/coordinate",
"params": {
"workflow": "questionnaire_to_contract",
"requesting_agent": "agent_a",
"mode": "multi_agent_parallel"
}
}
# System responds with assignments (from seeded workflow):
{
"assignments": {
"agent_a_stage_2": {
"prompt": "...extraction prompt from seeds...",
"tools": ["pdf_processor"],
"examples": [...562 seeded examples...],
"next_agent": "agent_a_stage_3"
},
"agent_a_stage_3": {
"prompt": "...enhancement prompt from seeds...",
"tools": ["enricher"],
"examples": [...9 seeded examples...],
"next_agent": "agent_a_stage_4"
},
// ... all 9 stages assigned
}
}
# Agents execute in parallel/sequence using seeded prompts
# Each agent has:
# - Prompt from seeds
# - Examples from seeds
# - Schema from seeds
# - Tools assigned by orchestratorπ Seeding Tables (Supabase Schema)
-- League examples (seeded from database/output-styles/)
CREATE TABLE league_examples (
id SERIAL PRIMARY KEY,
sport TEXT NOT NULL, -- basketball, soccer, etc.
tier TEXT, -- premium, professional, standard
stage TEXT, -- stage_2, stage_3, etc.
example_data JSONB, -- Complete example
source_file TEXT, -- Where seeded from
seeded_at TIMESTAMP DEFAULT NOW(),
version INT DEFAULT 1
);
-- Workflow definitions (seeded from database/output-styles/)
CREATE TABLE workflow_definitions (
workflow_name TEXT PRIMARY KEY,
total_stages INT,
stages JSONB, -- All stage configs
source_path TEXT, -- database/output-styles/...
seeded_at TIMESTAMP,
version INT DEFAULT 1
);
-- Prompt catalog (seeded from database/prompts/)
CREATE TABLE prompt_catalog (
prompt_type TEXT,
prompt_name TEXT,
prompt_content TEXT,
source_file TEXT, -- database/prompts/agents/...
metadata JSONB,
seeded_at TIMESTAMP,
version INT DEFAULT 1,
PRIMARY KEY (prompt_type, prompt_name)
);
-- Usage tracking (populated during runtime)
CREATE TABLE prompt_usage (
id SERIAL PRIMARY KEY,
prompt_name TEXT,
used_by_agent TEXT,
success BOOLEAN,
execution_time FLOAT,
used_at TIMESTAMP DEFAULT NOW()
);π Complete System Flow Example
Scenario: NBA Questionnaire Processing
Step 1: AI Agent Discovers Server
GET /mcp/discoverResponse:
{
"seeded_data": {
"workflows": 1,
"examples": 800,
"databases_seeded": ["firebase", "supabase", "inmemorystore"]
},
"capabilities": {
"execute_workflow": {
"available_workflows": ["questionnaire_to_contract"]
}
}
}Step 2: AI Agent Requests Workflow via MCP RPC
POST /mcp/rpc
{
"method": "prompts/execute",
"params": {
"workflow": "questionnaire_to_contract",
"input_data": {"file": "nba_questionnaire.pdf"},
"create_agents": true
}
}Step 3: System Retrieves Workflow (From Seeds)
# Check InMemoryStore first
workflow = store.get(("workflows",), "questionnaire_to_contract")
# β HIT: Loaded from DB seeds at startup (<1ms)
# Workflow contains (all from seed files):
{
"stages": [
{
"name": "stage_2_extraction",
"prompt": "... (from database/output-styles/stage_2/README.md)",
"examples": [...562 examples from stage_2/examples/*.json...],
"schema": {...from stage_2/schema/*.json...}
},
// ... 8 more stages (all from seeds)
]
}Step 4: Orchestrator Creates Agents (With Seeded Data)
# services/orchestrator.py
agents = []
for stage in workflow["stages"]:
agent = create_agent(
prompt=stage["prompt"], # From seeds
tools=assign_tools(stage), # From tool registry
examples=stage["examples"], # From seeds
schema=stage["schema"] # From seeds
)
agents.append(agent)
# Returns 9 agents, each with:
# - Seeded prompt
# - Seeded examples
# - Seeded schema
# - Assigned toolsStep 5: Execute with Agent Crew
# Each agent executes its stage
for i, agent in enumerate(agents):
state = await agent.execute(
input=state,
prompt=agent.prompt, # From seeds
examples=agent.examples, # From seeds
tools=agent.tools
)
# Returns final contractStep 6: System Learns (Re-Seeds)
# If user provides feedback
POST /api/prompts/update
{
"suggestions": ["NBA pricing too low", "Add luxury tax"]
}
# System:
# 1. Updates InMemoryStore (v1 β v2)
# 2. Re-seeds Supabase (updated examples)
# 3. Re-seeds Firebase (updated workflow)
# 4. Next execution uses v2 seeds!π― Final Architecture Benefits
β Seeded Intelligence
- Databases pre-populated from 135 prompts + 800 examples
- Fast queries (data already in DBs)
- Consistent across restarts
β Cached Performance
- InMemoryStore: <1ms retrieval
- DB seeds: ~20ms if cache miss
- File fallback: ~9ms worst case
β Self-Improving
- Update prompts via API
- Re-seeds databases automatically
- Next startup loads improved seeds
β Multi-Interface
- REST API (humans/frontends)
- MCP server (AI-to-AI)
- Agent orchestration (multi-agent)
β Production Reliable
- Seeds persist in databases
- Multi-tier fallback
- Graceful degradation
- Cross-instance sync
π Complete File Checklist
β Core System (Implemented):
-
apps/backend/stores/prompts.py- InMemoryStore + DB sync -
apps/backend/services/prompts.py- Workflow execution -
apps/backend/api/prompts.py- REST API -
apps/backend/server.py- Integration -
database/scripts/build.py- Seeding script -
database/scripts/validate.py- Tests (7/7 passing)
π Enhancements (Next Phase):
- Enhanced seeding in
build.py(Supabase + Firebase) -
apps/backend/mcp/prompt_server.py- MCP server -
apps/backend/services/orchestrator.py- Agent creation -
apps/backend/services/agent_communication.py- Agent protocol
π Current Status
What Works NOW:
- β InMemoryStore caching (<1ms)
- β Build from files (seeding)
- β REST API (5 endpoints)
- β Workflow execution
- β Update with suggestions
- β Database sync (Firebase + Supabase)
Ready to Add:
- π Enhanced DB seeding (Supabase examples, Firebase workflows)
- π€ MCP server (AI-to-AI communication)
- π Agent orchestration (create agents before execution)
- π¬ Agent-to-agent protocol (workflow coordination)
π Key Insight
This system treats prompts as living, evolving intelligence:
- Born from seed files (database/prompts/)
- Cached in memory (InMemoryStore) for speed
- Persisted in databases (seeded for reliability)
- Improved via feedback (re-seeded automatically)
- Shared via MCP (AI-to-AI collaboration)
- Orchestrated for complex workflows (multi-agent)
Result: A self-improving, multi-interface prompt intelligence system that gets better over time! π―
See: database/COMPLETE_ARCHITECTURE.md (this file) for full picture.
Start: START_HERE_PROMPT_INTELLIGENCE.md for quick setup.
Deploy: System ready for Google Cloud Run with seeded databases.