Architecture
Supabase Multi-Database Implementation Summary

Source: data_layer/docs/SUPABASE_IMPLEMENTATION_SUMMARY.md

Supabase Multi-Database Implementation Summary

Overview

This implementation provides a Supabase-based multi-database system that extends the proven JSONL β†’ Database pattern to support Neo4j graph relationships and Pinecone vector search.

Architecture

JSONL Seed Files (Source of Truth)
         ↓
    Supabase PostgreSQL (Primary Database)
         ↓
    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
    ↓         ↓
  Neo4j    Pinecone
  (graph)   (vectors)

Key Design Decisions

  1. Supabase as Primary Database (instead of Prisma)

    • Uses existing Supabase infrastructure
    • Direct Python client integration (supabase-py)
    • No ORM layer needed
    • Matches existing backend architecture
  2. Graceful Degradation

    • System works with just Supabase
    • Neo4j and Pinecone are optional
    • Embeddings optional (requires OpenAI)
  3. Sync Tracking

    • Per-database sync status fields
    • Automatic sync during seeding
    • Retry-able failures

Implementation Files

Core Scripts

  1. seed_supabase_multi_db.py (550 lines)

    • Unified seeding across all databases
    • Auto-generates OpenAI embeddings
    • Syncs to Neo4j (relationships) and Pinecone (vectors)
    • Tracks sync status in Supabase
  2. validate_supabase_multi_db.py (280 lines)

    • Validates all database connections
    • Auto-loads .env from project root
    • Reports sync health and status
    • Saves results to JSON
  3. seed.examples.py (existing, 250 lines)

    • Seeds few-shot examples from JSONL
    • Compatible with Supabase schema

Documentation

  1. QUICKSTART_SUPABASE.md

    • 10-minute setup guide
    • SQL schema for Supabase tables
    • Environment configuration
    • Sample queries
  2. INDEX.md (updated)

    • Complete navigation guide
    • References Supabase scripts
    • Updated learning path
  3. SUPABASE_IMPLEMENTATION_SUMMARY.md

    • This file - implementation overview

Database Schema

Sport Archetypes Table

CREATE TABLE sport_archetypes (
  id UUID PRIMARY KEY,
  archetype_name TEXT UNIQUE,
  display_name TEXT,
  code_pattern TEXT,
  characteristics JSONB,
  embedding_vector JSONB,
  neo4j_node_id TEXT,
  created_at TIMESTAMPTZ,
  updated_at TIMESTAMPTZ
);

5 Master Archetypes:

  • racing - Racing & Speed Sports
  • combat - Combat Sports
  • team_sport - Team Sports
  • precision - Precision & Target Sports
  • large_field - Large Field Competitions

Prospective Leagues Table

CREATE TABLE prospective_leagues (
  id UUID PRIMARY KEY,
  league_tag TEXT UNIQUE,
  league_name TEXT,
  sport_name TEXT,
  sport_tier TEXT,
  sport_archetype TEXT,
 
  -- Embedding
  embedding_vector JSONB,
  embedding_model TEXT,
 
  -- Neo4j sync
  neo4j_node_id TEXT,
  neo4j_sync_status TEXT,
  neo4j_synced_at TIMESTAMPTZ,
 
  -- Pinecone sync
  pinecone_id TEXT,
  pinecone_namespace TEXT,
  pinecone_sync_status TEXT,
  pinecone_synced_at TIMESTAMPTZ,
 
  created_at TIMESTAMPTZ,
  updated_at TIMESTAMPTZ
);

Few-Shot Examples Table

CREATE TABLE few_shot_examples (
  id UUID PRIMARY KEY,
  example_id TEXT UNIQUE,
  category TEXT,
  scenario TEXT,
  sport TEXT,
  tier TEXT,
  complexity TEXT,
  quality_score DECIMAL,
  usage_count INTEGER,
  input_data JSONB,
  output_data JSONB,
  tags JSONB,
  embedding_vector JSONB,
  pinecone_sync_status TEXT,
  created_at TIMESTAMPTZ,
  updated_at TIMESTAMPTZ
);

Seeding Workflow

1. Archetypes (Required)

python scripts/seed_supabase_multi_db.py --archetypes

Process:

  1. Generate embeddings for each archetype
  2. Upsert to Supabase sport_archetypes table
  3. Sync to Neo4j as SportArchetype nodes

2. Leagues (Optional)

python scripts/seed_supabase_multi_db.py --leagues

Process:

  1. Read JSON files from seeds/ directory
  2. Generate embedding from league name + sport + description
  3. Upsert to Supabase prospective_leagues table
  4. Create Neo4j League node
  5. Create BELONGS_TO_ARCHETYPE relationship
  6. Upsert vector to Pinecone altsports-leagues index
  7. Update sync status in Supabase

3. Examples (Optional)

python scripts/seed_supabase_multi_db.py --examples

Process:

  1. Read JSONL files from seed.examples-kb/data/
  2. Generate embeddings from scenario + input
  3. Upsert to Supabase few_shot_examples table
  4. Upsert vector to Pinecone altsports-examples index

Neo4j Graph Model

Nodes

  • SportArchetype - 5 master archetype nodes
  • League - Individual league nodes

Relationships

  • (League)-[:BELONGS_TO_ARCHETYPE]->(SportArchetype)

Sample Cypher Queries

// Get all combat sports leagues
MATCH (l:League)-[:BELONGS_TO_ARCHETYPE]->(a:SportArchetype {name: "combat"})
RETURN l.league_name, l.status
 
// Count leagues per archetype
MATCH (l:League)-[:BELONGS_TO_ARCHETYPE]->(a:SportArchetype)
RETURN a.name, count(l) as league_count
ORDER BY league_count DESC

Pinecone Vector Search

Indexes

  1. altsports-leagues (dimension: 1536)

    • League embeddings
    • Metadata: league_name, sport_name, sport_archetype, tier
  2. altsports-examples (dimension: 1536)

    • Example embeddings
    • Namespaces by category (triage, contract_generation, etc.)

Sample Queries

from pinecone import Pinecone
from openai import OpenAI
 
# Generate query embedding
openai_client = OpenAI()
response = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input="Find high-speed racing leagues"
)
query_vector = response.data[0].embedding
 
# Search Pinecone
pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY'))
index = pc.Index('altsports-leagues')
 
results = index.query(
    vector=query_vector,
    top_k=10,
    include_metadata=True
)
 
# Get full data from Supabase
league_tags = [match['id'] for match in results['matches']]
supabase_results = supabase.table('prospective_leagues')\
    .select('*')\
    .in_('league_tag', league_tags)\
    .execute()

Multi-Database Query Patterns

Pattern 1: Semantic Search β†’ Supabase

# 1. Vector search in Pinecone
pinecone_results = index.query(vector=query_embedding, top_k=10)
 
# 2. Get full records from Supabase
league_tags = [r['id'] for r in pinecone_results['matches']]
leagues = supabase.table('prospective_leagues')\
    .select('*')\
    .in_('league_tag', league_tags)\
    .execute()

Pattern 2: Supabase β†’ Neo4j Relationships

# 1. Get league from Supabase
league = supabase.table('prospective_leagues')\
    .select('*')\
    .eq('league_tag', 'ufc')\
    .single()\
    .execute()
 
# 2. Get archetype siblings from Neo4j
with neo4j_driver.session() as session:
    result = session.run("""
        MATCH (l:League {league_tag: $tag})-[:BELONGS_TO_ARCHETYPE]->(a:SportArchetype)
        MATCH (sibling:League)-[:BELONGS_TO_ARCHETYPE]->(a)
        WHERE sibling.league_tag <> $tag
        RETURN sibling.league_name
    """, tag='ufc')

Pattern 3: Combined Multi-DB Query

# 1. Semantic search (Pinecone)
pinecone_matches = index.query(vector=embedding, top_k=20)
 
# 2. Get full data (Supabase)
leagues = supabase.table('prospective_leagues')\
    .select('*')\
    .in_('league_tag', [m['id'] for m in pinecone_matches])\
    .execute()
 
# 3. For each, get graph relationships (Neo4j)
for league in leagues.data:
    with neo4j_driver.session() as session:
        siblings = session.run("""
            MATCH (l:League {league_tag: $tag})-[:BELONGS_TO_ARCHETYPE]->(a)
            MATCH (sibling)-[:BELONGS_TO_ARCHETYPE]->(a)
            RETURN sibling.league_name
        """, tag=league['league_tag'])

Environment Setup

Required

SUPABASE_URL=postgresql://postgres:password@db.project.supabase.co:5432/postgres
SUPABASE_API_KEY=your_anon_key

Optional

# For embeddings
OPENAI_API_KEY=sk-...
 
# For graph relationships
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=...
 
# For vector search
PINECONE_API_KEY=...

Usage Examples

Setup and Validation

# Validate setup
python database/scripts/validate_supabase_multi_db.py
 
# Check results
cat database/validation_results.json

Seeding

# Seed everything
python database/scripts/seed_supabase_multi_db.py --all
 
# Seed specific components
python database/scripts/seed_supabase_multi_db.py --archetypes
python database/scripts/seed_supabase_multi_db.py --leagues --limit 10
python database/scripts/seed_supabase_multi_db.py --examples --category triage

Querying

from supabase import create_client
import os
 
# Connect
supabase = create_client(
    os.getenv('SUPABASE_URL'),
    os.getenv('SUPABASE_API_KEY')
)
 
# Query combat sports
result = supabase.table('prospective_leagues')\
    .select('*')\
    .eq('sport_archetype', 'combat')\
    .execute()
 
print(f"Found {len(result.data)} combat sports leagues")

Key Features

1. Sport Archetype System

  • 5 master categories for all sports
  • Code pattern mapping (RaceEvent, FightEvent, etc.)
  • Characteristics metadata (JSONB)
  • Graph navigation via Neo4j

2. Embedding Generation

  • Auto-generates OpenAI embeddings during seeding
  • text-embedding-3-small (1536 dimensions)
  • Stored as JSONB in Supabase
  • Synced to Pinecone for vector search

3. Sync Tracking

  • Per-database sync status (pending/synced/failed)
  • Timestamps for sync operations
  • Foreign key references (neo4j_node_id, pinecone_id)
  • Queryable sync health

4. Graceful Degradation

  • Works with just Supabase
  • Neo4j optional (graph features disabled)
  • Pinecone optional (semantic search disabled)
  • OpenAI optional (no embeddings generated)

Comparison to Original Prisma Approach

FeatureOriginal (Prisma)New (Supabase)
ORMPrismaNone (direct SQL)
SchemaPrisma schemaSupabase SQL
ClientPrisma PythonSupabase Python
MigrationsPrisma migrateSupabase migrations
Type SafetyPrisma typesManual types
IntegrationNew dependencyExisting backend

Why Supabase?

  • Already in use by existing backend
  • No new dependencies (prisma)
  • Direct SQL control
  • Matches existing architecture

Next Steps

  1. Create Supabase Tables - Run SQL from QUICKSTART_SUPABASE.md
  2. Validate Setup - Run validate_supabase_multi_db.py
  3. Seed Archetypes - Run with --archetypes flag
  4. Add League Seeds - Create JSON files in seeds/
  5. Test Queries - Use examples from QUICKSTART_SUPABASE.md

Troubleshooting

Supabase Connection Issues

  • Ensure SUPABASE_URL is the PostgreSQL connection string (starts with postgresql://)
  • Validator auto-extracts HTTP URL for Python client
  • Check SUPABASE_API_KEY is the anon key from dashboard

Neo4j Not Required

  • Neo4j is optional
  • System works without it
  • Graph features simply disabled

Embedding Generation Fails

  • Ensure OPENAI_API_KEY is set
  • Check OpenAI account credits
  • Embeddings are optional

Files Created

  1. scripts/seed_supabase_multi_db.py - Unified seeding
  2. scripts/validate_supabase_multi_db.py - Setup validator
  3. QUICKSTART_SUPABASE.md - Quick start guide
  4. SUPABASE_IMPLEMENTATION_SUMMARY.md - This file
  5. INDEX.md (updated) - Navigation guide

Architecture Benefits

  1. Backwards Compatible - Works with existing Supabase setup
  2. No New Dependencies - Uses existing supabase-py client
  3. Optional Extensions - Neo4j and Pinecone are opt-in
  4. Sync Tracking - Built-in sync status management
  5. Graceful Degradation - Works even if optional DBs unavailable

Status: Production Ready πŸš€ Pattern: JSONL β†’ Supabase β†’ Neo4j + Pinecone Built with: Supabase + Neo4j + Pinecone + OpenAI

Platform

Documentation

Community

Support

partnership@altsportsdata.comdev@altsportsleagues.ai

2025 Β© AltSportsLeagues.ai. Powered by AI-driven sports business intelligence.

πŸ€– AI-Enhancedβ€’πŸ“Š Data-Drivenβ€’βš‘ Real-Time