Architecture
CLAUDE.md

Source: data_layer/docs/CLAUDE.md

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Architecture Overview

This is the database layer for the AltSportsLeagues.ai sports partnership intelligence platform. It implements a dual-database architecture separating prospective leagues (Supabase) from verified partners (Firebase), with a knowledge base system for AI-powered interactions.

Core Architecture Principles

Two-Tier Database Strategy:

  • Supabase: All leagues (scraped + verified) with source tracking and verification status
  • Firebase: Only verified partner leagues with active relationships
  • Knowledge Base: Persistent learning data, prompts, and interaction examples
  • Context Management: Transient session state and workflow tracking

Key Database Systems

  1. Supabase (Opportunity Database) - prospective_leagues table

    • All discovered leagues from web scraping, emails, forms
    • Opportunity scoring, enrichment data, contact history
    • Source tracking: web_scrape, email_ingest, form_submission, human_verified
    • Verification workflow: unverified β†’ contacted β†’ human_verified
  2. Firebase (Partner Database) - verified_leagues collection

    • Only human-verified partnerships
    • Contracts, communications, user accounts
    • Real-time updates and Google Sheets sync
  3. Knowledge Base - seed.examples-kb/, kb_catalog/, prompts/

    • Historical interaction examples for AI learning
    • Prompt templates and workflow recipes
    • Schema catalogs and document templates
  4. PostgreSQL Schema - sql/core-schema.sql

    • Enhanced pipeline management with stages
    • Opportunity scoring and automation rules
    • Activity tracking and analytics snapshots

Directory Structure

database/
β”œβ”€β”€ seed.examples-kb/  # Historical AI interaction examples
β”œβ”€β”€ kb_catalog/                  # Schema and prompt catalogs
β”œβ”€β”€ prompts/                     # Prompt engineering system
β”œβ”€β”€ ops/                         # Contract builders and workflows
β”œβ”€β”€ output-styles/               # Document generation templates
β”œβ”€β”€ schemas/                     # Data structure definitions
β”‚   β”œβ”€β”€ core/                    # Core business schemas
β”‚   β”œβ”€β”€ models/                  # Database models (PostgreSQL, Redshift)
β”‚   └── typescript/              # TypeScript type definitions
β”œβ”€β”€ scripts/                     # Database utilities and setup
β”œβ”€β”€ sql/                         # SQL schema files
β”œβ”€β”€ setup/                       # Initial setup scripts
└── docs/                        # Architecture documentation

Essential Commands

Database Setup

# Supabase setup (required)
# 1. Create project at https://supabase.com
# 2. Run SQL migration from schemas/models/postgresql/
# 3. Configure environment variables
 
# Test unified database system
cd apps/backend
python -m services.unified_league_database
 
# Initialize core PostgreSQL schema
psql -d your_database -f database/sql/core-schema.sql

Working with Leagues

# Add scraped league (Supabase only)
from apps.backend.services.unified_league_database import upsert_scraped_league
 
result = await upsert_scraped_league({
    "name": "International Basketball League",
    "sport_name": "Basketball",
    "sport_tier": "TIER2",
    "source_url": "https://example.com/ibl",
    "opportunity_score": 75
})
 
# Add verified league (both databases)
from apps.backend.services.unified_league_database import upsert_verified_league
 
result = await upsert_verified_league(
    {"name": "Premier Volleyball League", "sport_name": "Volleyball"},
    user_context={"email": "partner@altsportsdata.com"}
)
 
# Promote scraped to verified
from apps.backend.services.unified_league_database import UnifiedLeagueDatabase
 
db = UnifiedLeagueDatabase()
result = await db.promote_to_firebase(
    supabase_league_id="abc-123",
    user_context={"email": "sales@altsportsdata.com"}
)

Knowledge Base Operations

# Query knowledge base for examples
from database.seed.examples_kb import api
 
examples = api.get_examples(
    query="contract generation",
    category="business_deals",
    limit=5
)
 
# Build prompt with context
from database.prompts import PromptBuilder
 
prompt_builder = PromptBuilder()
prompt = prompt_builder.build_with_context(
    template="contract_generation",
    context={"tier": "premium", "sport": "basketball"}
)

Database Adapters

# Supabase adapter
from apps.backend.services.supabase_adapter import SupabaseAdapter
 
supabase = SupabaseAdapter()
leagues = await supabase.query_leagues({"sport_name": "Basketball"})
 
# Firebase adapter
from apps.backend.services.firebase_adapter import FirebaseAdapter
 
firebase = FirebaseAdapter()
verified = await firebase.get_verified_leagues()

Key Concepts

Source Tracking

Every league has a source_type that determines Firebase eligibility:

Source TypeFirebase?Description
web_scrape❌Discovered via scraping
human_verifiedβœ…Verified via human contact
league_owner_registrationβœ…Owner self-registered
email_ingest❌Extracted from emails

Verification Workflow

unverified β†’ investigating β†’ contacted β†’ human_verified β†’ partnership_active
     ↓
  rejected

Promotion Workflow

Supabase (All Leagues)
    ↓ source_type = web_scrape, unverified
    ↓ Human contact + verification
    ↓ verification_status = human_verified
    ↓
Firebase (Verified Partners Only)

Knowledge vs Context

  • Knowledge Base: Persistent learning data, rarely changes, versioned

    • Use for: Historical examples, prompt templates, schemas
    • Storage: seed.examples-kb/, kb_catalog/, prompts/
  • Context: Transient session state, frequently changes, ephemeral

    • Use for: Active user sessions, workflow state, runtime caching
    • Storage: In-memory or session-scoped

Environment Variables

Required in .env:

# Supabase (Required)
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_KEY=your-service-key
 
# Firebase (Required for verified leagues)
FIREBASE_SERVICE_ACCOUNT_PATH=/path/to/service-account.json
 
# Frontend
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
NEXT_PUBLIC_BACKEND_URL=http://localhost:8000

Important Files

Core Services (Python)

  • apps/backend/services/unified_league_database.py - Main database orchestrator
  • apps/backend/services/supabase_adapter.py - Supabase operations
  • apps/backend/services/firebase_adapter.py - Firebase operations

Schemas

  • sql/core-schema.sql - PostgreSQL schema with pipelines and automation
  • schemas/models/postgresql/init_db.sql - Full database initialization
  • schemas/core/ - Core business logic schemas

Knowledge Systems

  • seed.examples-kb/api.py - Example retrieval API
  • prompts/integration_utilities.py - Prompt building utilities
  • ops/contextual_contract_builder.py - Contract generation

Documentation

  • DATABASE_ARCHITECTURE.md - Complete architecture details
  • QUICKSTART.md - 5-minute setup guide
  • IMPLEMENTATION_SUMMARY.md - System implementation overview
  • KNOWLEDGE_VS_CONTEXT_GUIDE.md - Knowledge/context separation

Database Schema Highlights

Supabase Tables

  • prospective_leagues - All leagues with source tracking
  • scrape_sessions - Web scraping activity tracking
  • league_enrichment - Research and enrichment data
  • opportunity_evaluations - AI-powered scoring history
  • contact_history - Outreach attempts and responses

PostgreSQL Pipeline Schema

  • pipelines - Partnership pipeline definitions
  • pipeline_stages - Stage definitions with probabilities
  • league_opportunities - Enhanced opportunity tracking
  • scoring_rules - Lead scoring automation
  • automation_rules - Workflow automation triggers
  • opportunity_activities - Activity and interaction tracking

Firebase Collections

  • verified_leagues - Verified partner leagues
  • contracts - Partnership contracts
  • communications - Email threads
  • user_accounts - League owner accounts

Development Guidelines

Adding New Leagues

  1. Scraped Discovery: Use upsert_scraped_league() β†’ Supabase only
  2. Human Verification: Use upsert_verified_league() β†’ Both databases
  3. Owner Registration: Use upsert_owner_registered_league() β†’ Both databases (highest trust)

Working with Knowledge Base

  • Query examples before generating new content
  • Store successful interactions for future learning
  • Use prompt templates from prompts/ directory
  • Keep knowledge separate from session context

Database Queries

Python Backend:

from apps.backend.services.unified_league_database import (
    query_all_leagues,
    query_verified_leagues_only,
    query_scraped_leagues_only
)
 
# Filter by attributes
leagues = await query_all_leagues({"sport_name": "Basketball"})

TypeScript Frontend:

import { getLeagueDatabaseClient } from '@/lib/league-database-client'
 
const client = getLeagueDatabaseClient()
const result = await client.query("Show me high-potential leagues")

Testing

# Test database adapters
python -m apps.backend.services.unified_league_database
 
# Test knowledge base
python -m database.seed.examples-kb.api
 
# Verify schema
psql -d your_database -f database/sql/core-schema.sql

Common Patterns

Opportunity Scoring Pipeline

  1. Scrape/ingest league data β†’ Supabase
  2. Enrich with market research
  3. Score opportunity (AI-powered)
  4. Human review if score > threshold
  5. Contact and verify
  6. Promote to Firebase if verified

Contract Generation

  1. Query knowledge base for similar contracts
  2. Build context with league data
  3. Use prompt template from ops/contract_builders/
  4. Generate contract with AI
  5. Store example in knowledge base

Email Intelligence

  1. Classify incoming email (triage system)
  2. Extract league information
  3. Store in Supabase with source_type: email_ingest
  4. Score opportunity
  5. Route to appropriate workflow

Performance Considerations

  • Use Supabase indexes for common queries (sport, tier, status, score)
  • Cache knowledge base queries for repeated prompts
  • Separate knowledge (persistent) from context (ephemeral)
  • Mock adapters available for testing without real databases

Architecture Philosophy: Retrieval Over Generation

This database layer prioritizes compression, storage, and retrieval over generation. Instead of regenerating content from scratch each time, we store successful outputs and retrieve modular, reusable components.

Core Principles

1. Retrieval-First Workflow

OLD: Query β†’ Generate from scratch β†’ Return
NEW: Query β†’ Embed β†’ Match β†’ Retrieve β†’ Compose (minimal generation)

2. Three-Tier Architecture

  • Compression: Store successful outputs as reusable modules
  • Indexing: Triple-point index (entity relationships + metadata + embeddings)
  • Retrieval: Fast semantic search + graph-based relationships

3. Continuous Learning

  • Store every successful output (contracts, responses, prompts)
  • Track usage patterns and success rates
  • Update embeddings and relationships based on feedback

Technology Stack

Vector Embeddings

  • Semantic similarity search for content matching
  • Storage: Chroma (lightweight, Python-native)
  • Alternatives: FAISS (fast), Qdrant (production-ready)

Triple-Point Index

  • JSON-based relationship storage
  • Links: entities ↔ metadata ↔ embeddings
  • Enables both semantic and graph-based queries

LangMem Integration (Optional)

  • Memory and context management
  • Essentially: JSON triple-point index + embedding spaces
  • Useful for session state and cross-request learning

Directory Structure

database/
β”œβ”€β”€ knowledge/                  # Retrieval-first knowledge base
β”‚   β”œβ”€β”€ embeddings/            # Vector embedding service
β”‚   β”‚   β”œβ”€β”€ service.py         # Embedding generation
β”‚   β”‚   β”œβ”€β”€ index.py           # Vector index (Chroma/FAISS)
β”‚   β”‚   └── config.py          # Model configurations
β”‚   β”œβ”€β”€ index/                 # Triple-point relationship storage
β”‚   β”‚   β”œβ”€β”€ triple_store.py    # Entity relationship storage
β”‚   β”‚   β”œβ”€β”€ query_engine.py    # Multi-modal querying
β”‚   β”‚   └── update_service.py  # Incremental updates
β”‚   β”œβ”€β”€ examples/              # Few-shot examples (JSONL + embeddings)
β”‚   β”œβ”€β”€ schemas/               # Schema definitions
β”‚   └── templates/             # Reusable modular components
β”œβ”€β”€ config/                     # Configuration presets (retrieval-friendly)
└── schemas/                    # Type definitions

Usage Patterns

Contract Generation (Retrieval-First)

# OLD: Generate from scratch (30 seconds)
def generate_contract(league_data):
    prompt = build_prompt(league_data)        # Regenerate
    sections = llm.generate(prompt)           # LLM call
    return assemble(sections)
 
# NEW: Retrieve + compose (3 seconds)
def retrieve_and_compose_contract(league_data):
    # Find similar successful contracts
    similar = retriever.find_similar(
        query=league_data.semantic_description,
        filters={"tier": league_data.tier, "sport": league_data.sport},
        min_similarity=0.8
    )
 
    # Compose from retrieved modules
    contract = composer.assemble(
        base_template=similar[0],
        modifications=league_data.specific_terms,
        generate_only=["custom_clauses"]  # Minimal generation
    )
 
    # Store for future retrieval
    knowledge.store(contract, metadata=league_data, feedback="approved")
    return contract

Semantic Search Example

from database.knowledge.index import QueryEngine
 
query = QueryEngine()
results = query.find(
    semantic="premium basketball league partnership",
    filters={"tier": "premium", "sport": "basketball"},
    min_similarity=0.8,
    graph_hops=2,  # Follow relationships
    limit=5
)

Performance Benefits

OperationBefore (Generation)After (Retrieval)Improvement
Contract generation~30 seconds~3 seconds10x faster
Response generation~10 seconds~1 second10x faster
ConsistencyVariableHigh (reuses proven patterns)Quality ↑
LearningNoneContinuous feedback loopIntelligence ↑

Implementation Status

βœ… Existing Foundation

  • knowledge/examples/retriever.py - Semantic retrieval system
  • knowledge/examples/matcher.py - Similarity matching
  • knowledge/examples/cache.py - LRU caching
  • JSONL storage for few-shot examples

🚧 In Progress

  • Vector embedding service (knowledge/embeddings/)
  • Triple-point index system (knowledge/index/)
  • Feedback loop for continuous learning

πŸ“‹ Planned

  • Convert contract generation to retrieval-first
  • Migrate prompt building to template retrieval
  • Implement LangMem integration (optional)

Best Practices

  1. Store Every Success: When a contract is signed, response approved, or output works well β†’ store it
  2. Embed Immediately: Generate embeddings when storing new content
  3. Update Relationships: Track which entities are used together
  4. Generate Minimally: Only generate what truly can't be retrieved/composed
  5. Close the Loop: Capture feedback to improve retrieval quality

Migration Notes

Current structure is in transition:

  • seed.examples-kb/ β†’ knowledge/examples/
  • kb_catalog/ β†’ knowledge/schemas/
  • prompts/ β†’ knowledge/templates/
  • Session management β†’ future context/ directory

See KNOWLEDGE_VS_CONTEXT_GUIDE.md for migration details.

Platform

Documentation

Community

Support

partnership@altsportsdata.comdev@altsportsleagues.ai

2025 Β© AltSportsLeagues.ai. Powered by AI-driven sports business intelligence.

πŸ€– AI-Enhancedβ€’πŸ“Š Data-Drivenβ€’βš‘ Real-Time