Source: data_layer/docs/CLAUDE.md
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Architecture Overview
This is the database layer for the AltSportsLeagues.ai sports partnership intelligence platform. It implements a dual-database architecture separating prospective leagues (Supabase) from verified partners (Firebase), with a knowledge base system for AI-powered interactions.
Core Architecture Principles
Two-Tier Database Strategy:
- Supabase: All leagues (scraped + verified) with source tracking and verification status
- Firebase: Only verified partner leagues with active relationships
- Knowledge Base: Persistent learning data, prompts, and interaction examples
- Context Management: Transient session state and workflow tracking
Key Database Systems
-
Supabase (Opportunity Database) -
prospective_leaguestable- All discovered leagues from web scraping, emails, forms
- Opportunity scoring, enrichment data, contact history
- Source tracking:
web_scrape,email_ingest,form_submission,human_verified - Verification workflow:
unverifiedβcontactedβhuman_verified
-
Firebase (Partner Database) -
verified_leaguescollection- Only human-verified partnerships
- Contracts, communications, user accounts
- Real-time updates and Google Sheets sync
-
Knowledge Base -
seed.examples-kb/,kb_catalog/,prompts/- Historical interaction examples for AI learning
- Prompt templates and workflow recipes
- Schema catalogs and document templates
-
PostgreSQL Schema -
sql/core-schema.sql- Enhanced pipeline management with stages
- Opportunity scoring and automation rules
- Activity tracking and analytics snapshots
Directory Structure
database/
βββ seed.examples-kb/ # Historical AI interaction examples
βββ kb_catalog/ # Schema and prompt catalogs
βββ prompts/ # Prompt engineering system
βββ ops/ # Contract builders and workflows
βββ output-styles/ # Document generation templates
βββ schemas/ # Data structure definitions
β βββ core/ # Core business schemas
β βββ models/ # Database models (PostgreSQL, Redshift)
β βββ typescript/ # TypeScript type definitions
βββ scripts/ # Database utilities and setup
βββ sql/ # SQL schema files
βββ setup/ # Initial setup scripts
βββ docs/ # Architecture documentationEssential Commands
Database Setup
# Supabase setup (required)
# 1. Create project at https://supabase.com
# 2. Run SQL migration from schemas/models/postgresql/
# 3. Configure environment variables
# Test unified database system
cd apps/backend
python -m services.unified_league_database
# Initialize core PostgreSQL schema
psql -d your_database -f database/sql/core-schema.sqlWorking with Leagues
# Add scraped league (Supabase only)
from apps.backend.services.unified_league_database import upsert_scraped_league
result = await upsert_scraped_league({
"name": "International Basketball League",
"sport_name": "Basketball",
"sport_tier": "TIER2",
"source_url": "https://example.com/ibl",
"opportunity_score": 75
})
# Add verified league (both databases)
from apps.backend.services.unified_league_database import upsert_verified_league
result = await upsert_verified_league(
{"name": "Premier Volleyball League", "sport_name": "Volleyball"},
user_context={"email": "partner@altsportsdata.com"}
)
# Promote scraped to verified
from apps.backend.services.unified_league_database import UnifiedLeagueDatabase
db = UnifiedLeagueDatabase()
result = await db.promote_to_firebase(
supabase_league_id="abc-123",
user_context={"email": "sales@altsportsdata.com"}
)Knowledge Base Operations
# Query knowledge base for examples
from database.seed.examples_kb import api
examples = api.get_examples(
query="contract generation",
category="business_deals",
limit=5
)
# Build prompt with context
from database.prompts import PromptBuilder
prompt_builder = PromptBuilder()
prompt = prompt_builder.build_with_context(
template="contract_generation",
context={"tier": "premium", "sport": "basketball"}
)Database Adapters
# Supabase adapter
from apps.backend.services.supabase_adapter import SupabaseAdapter
supabase = SupabaseAdapter()
leagues = await supabase.query_leagues({"sport_name": "Basketball"})
# Firebase adapter
from apps.backend.services.firebase_adapter import FirebaseAdapter
firebase = FirebaseAdapter()
verified = await firebase.get_verified_leagues()Key Concepts
Source Tracking
Every league has a source_type that determines Firebase eligibility:
| Source Type | Firebase? | Description |
|---|---|---|
web_scrape | β | Discovered via scraping |
human_verified | β | Verified via human contact |
league_owner_registration | β | Owner self-registered |
email_ingest | β | Extracted from emails |
Verification Workflow
unverified β investigating β contacted β human_verified β partnership_active
β
rejectedPromotion Workflow
Supabase (All Leagues)
β source_type = web_scrape, unverified
β Human contact + verification
β verification_status = human_verified
β
Firebase (Verified Partners Only)Knowledge vs Context
-
Knowledge Base: Persistent learning data, rarely changes, versioned
- Use for: Historical examples, prompt templates, schemas
- Storage:
seed.examples-kb/,kb_catalog/,prompts/
-
Context: Transient session state, frequently changes, ephemeral
- Use for: Active user sessions, workflow state, runtime caching
- Storage: In-memory or session-scoped
Environment Variables
Required in .env:
# Supabase (Required)
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_KEY=your-service-key
# Firebase (Required for verified leagues)
FIREBASE_SERVICE_ACCOUNT_PATH=/path/to/service-account.json
# Frontend
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
NEXT_PUBLIC_BACKEND_URL=http://localhost:8000Important Files
Core Services (Python)
apps/backend/services/unified_league_database.py- Main database orchestratorapps/backend/services/supabase_adapter.py- Supabase operationsapps/backend/services/firebase_adapter.py- Firebase operations
Schemas
sql/core-schema.sql- PostgreSQL schema with pipelines and automationschemas/models/postgresql/init_db.sql- Full database initializationschemas/core/- Core business logic schemas
Knowledge Systems
seed.examples-kb/api.py- Example retrieval APIprompts/integration_utilities.py- Prompt building utilitiesops/contextual_contract_builder.py- Contract generation
Documentation
DATABASE_ARCHITECTURE.md- Complete architecture detailsQUICKSTART.md- 5-minute setup guideIMPLEMENTATION_SUMMARY.md- System implementation overviewKNOWLEDGE_VS_CONTEXT_GUIDE.md- Knowledge/context separation
Database Schema Highlights
Supabase Tables
prospective_leagues- All leagues with source trackingscrape_sessions- Web scraping activity trackingleague_enrichment- Research and enrichment dataopportunity_evaluations- AI-powered scoring historycontact_history- Outreach attempts and responses
PostgreSQL Pipeline Schema
pipelines- Partnership pipeline definitionspipeline_stages- Stage definitions with probabilitiesleague_opportunities- Enhanced opportunity trackingscoring_rules- Lead scoring automationautomation_rules- Workflow automation triggersopportunity_activities- Activity and interaction tracking
Firebase Collections
verified_leagues- Verified partner leaguescontracts- Partnership contractscommunications- Email threadsuser_accounts- League owner accounts
Development Guidelines
Adding New Leagues
- Scraped Discovery: Use
upsert_scraped_league()β Supabase only - Human Verification: Use
upsert_verified_league()β Both databases - Owner Registration: Use
upsert_owner_registered_league()β Both databases (highest trust)
Working with Knowledge Base
- Query examples before generating new content
- Store successful interactions for future learning
- Use prompt templates from
prompts/directory - Keep knowledge separate from session context
Database Queries
Python Backend:
from apps.backend.services.unified_league_database import (
query_all_leagues,
query_verified_leagues_only,
query_scraped_leagues_only
)
# Filter by attributes
leagues = await query_all_leagues({"sport_name": "Basketball"})TypeScript Frontend:
import { getLeagueDatabaseClient } from '@/lib/league-database-client'
const client = getLeagueDatabaseClient()
const result = await client.query("Show me high-potential leagues")Testing
# Test database adapters
python -m apps.backend.services.unified_league_database
# Test knowledge base
python -m database.seed.examples-kb.api
# Verify schema
psql -d your_database -f database/sql/core-schema.sqlCommon Patterns
Opportunity Scoring Pipeline
- Scrape/ingest league data β Supabase
- Enrich with market research
- Score opportunity (AI-powered)
- Human review if score > threshold
- Contact and verify
- Promote to Firebase if verified
Contract Generation
- Query knowledge base for similar contracts
- Build context with league data
- Use prompt template from
ops/contract_builders/ - Generate contract with AI
- Store example in knowledge base
Email Intelligence
- Classify incoming email (triage system)
- Extract league information
- Store in Supabase with
source_type: email_ingest - Score opportunity
- Route to appropriate workflow
Performance Considerations
- Use Supabase indexes for common queries (sport, tier, status, score)
- Cache knowledge base queries for repeated prompts
- Separate knowledge (persistent) from context (ephemeral)
- Mock adapters available for testing without real databases
Architecture Philosophy: Retrieval Over Generation
This database layer prioritizes compression, storage, and retrieval over generation. Instead of regenerating content from scratch each time, we store successful outputs and retrieve modular, reusable components.
Core Principles
1. Retrieval-First Workflow
OLD: Query β Generate from scratch β Return
NEW: Query β Embed β Match β Retrieve β Compose (minimal generation)2. Three-Tier Architecture
- Compression: Store successful outputs as reusable modules
- Indexing: Triple-point index (entity relationships + metadata + embeddings)
- Retrieval: Fast semantic search + graph-based relationships
3. Continuous Learning
- Store every successful output (contracts, responses, prompts)
- Track usage patterns and success rates
- Update embeddings and relationships based on feedback
Technology Stack
Vector Embeddings
- Semantic similarity search for content matching
- Storage: Chroma (lightweight, Python-native)
- Alternatives: FAISS (fast), Qdrant (production-ready)
Triple-Point Index
- JSON-based relationship storage
- Links: entities β metadata β embeddings
- Enables both semantic and graph-based queries
LangMem Integration (Optional)
- Memory and context management
- Essentially: JSON triple-point index + embedding spaces
- Useful for session state and cross-request learning
Directory Structure
database/
βββ knowledge/ # Retrieval-first knowledge base
β βββ embeddings/ # Vector embedding service
β β βββ service.py # Embedding generation
β β βββ index.py # Vector index (Chroma/FAISS)
β β βββ config.py # Model configurations
β βββ index/ # Triple-point relationship storage
β β βββ triple_store.py # Entity relationship storage
β β βββ query_engine.py # Multi-modal querying
β β βββ update_service.py # Incremental updates
β βββ examples/ # Few-shot examples (JSONL + embeddings)
β βββ schemas/ # Schema definitions
β βββ templates/ # Reusable modular components
βββ config/ # Configuration presets (retrieval-friendly)
βββ schemas/ # Type definitionsUsage Patterns
Contract Generation (Retrieval-First)
# OLD: Generate from scratch (30 seconds)
def generate_contract(league_data):
prompt = build_prompt(league_data) # Regenerate
sections = llm.generate(prompt) # LLM call
return assemble(sections)
# NEW: Retrieve + compose (3 seconds)
def retrieve_and_compose_contract(league_data):
# Find similar successful contracts
similar = retriever.find_similar(
query=league_data.semantic_description,
filters={"tier": league_data.tier, "sport": league_data.sport},
min_similarity=0.8
)
# Compose from retrieved modules
contract = composer.assemble(
base_template=similar[0],
modifications=league_data.specific_terms,
generate_only=["custom_clauses"] # Minimal generation
)
# Store for future retrieval
knowledge.store(contract, metadata=league_data, feedback="approved")
return contractSemantic Search Example
from database.knowledge.index import QueryEngine
query = QueryEngine()
results = query.find(
semantic="premium basketball league partnership",
filters={"tier": "premium", "sport": "basketball"},
min_similarity=0.8,
graph_hops=2, # Follow relationships
limit=5
)Performance Benefits
| Operation | Before (Generation) | After (Retrieval) | Improvement |
|---|---|---|---|
| Contract generation | ~30 seconds | ~3 seconds | 10x faster |
| Response generation | ~10 seconds | ~1 second | 10x faster |
| Consistency | Variable | High (reuses proven patterns) | Quality β |
| Learning | None | Continuous feedback loop | Intelligence β |
Implementation Status
β Existing Foundation
knowledge/examples/retriever.py- Semantic retrieval systemknowledge/examples/matcher.py- Similarity matchingknowledge/examples/cache.py- LRU caching- JSONL storage for few-shot examples
π§ In Progress
- Vector embedding service (
knowledge/embeddings/) - Triple-point index system (
knowledge/index/) - Feedback loop for continuous learning
π Planned
- Convert contract generation to retrieval-first
- Migrate prompt building to template retrieval
- Implement LangMem integration (optional)
Best Practices
- Store Every Success: When a contract is signed, response approved, or output works well β store it
- Embed Immediately: Generate embeddings when storing new content
- Update Relationships: Track which entities are used together
- Generate Minimally: Only generate what truly can't be retrieved/composed
- Close the Loop: Capture feedback to improve retrieval quality
Migration Notes
Current structure is in transition:
seed.examples-kb/βknowledge/examples/kb_catalog/βknowledge/schemas/prompts/βknowledge/templates/- Session management β future
context/directory
See KNOWLEDGE_VS_CONTEXT_GUIDE.md for migration details.