Source: data_layer/docs/QUICKSTART_MULTI_DB.md
Multi-Database Quick Start Guide
Get your Prisma + Neo4j + Pinecone system running in 10 minutes!
π 5-Minute Setup
Step 1: Install Dependencies
cd database
# Core dependencies
uv add prisma prisma-client neo4j pinecone-client openai python-dotenv
# Generate Prisma client
cd schemas/prisma
uv run prisma generate --schema=schema.v2.enhanced.prismaStep 2: Setup Environment
Create database/.env:
# Required: PostgreSQL
DATABASE_URL="postgresql://user:password@localhost:5432/altsports?schema=public"
# Optional but recommended: Neo4j
NEO4J_URI="bolt://localhost:7687"
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD="your_password"
# Optional but recommended: Pinecone
PINECONE_API_KEY="your_pinecone_key"
# Required if using Pinecone: OpenAI
OPENAI_API_KEY="your_openai_key"Step 3: Initialize Databases
# PostgreSQL schema
uv run prisma migrate dev --schema=schemas/prisma/schema.v2.enhanced.prisma
# Neo4j schema (if using Neo4j)
cat schemas/models/neo4j/init_db.cql | cypher-shell -u neo4j -p your_passwordStep 4: Seed Data
# Seed everything (recommended first run)
uv run python scripts/seed_unified_multi_db.py --archetypes --leagues --examples
# Or seed selectively
uv run python scripts/seed_unified_multi_db.py --archetypes # Just archetypes
uv run python scripts/seed_unified_multi_db.py --leagues --limit 5 # 5 leagues
uv run python scripts/seed_unified_multi_db.py --examples --category triage # One categoryOutput:
π Connecting to databases...
β
PostgreSQL (Prisma)
β
Neo4j
β
Pinecone
β
OpenAI (embeddings)
ποΈ Seeding Sport Archetypes...
β¨ Created archetype: racing
β
Synced to Neo4j
β
Synced to Pinecone
...
π MULTI-DATABASE SEEDING SUMMARY
==================================================
Total Processed: 15
PostgreSQL: β¨ 10 created, βοΈ 5 updated
Neo4j: β
15 synced
Pinecone: β
15 synced
==================================================π― Quick Examples
Example 1: Query by Archetype (PostgreSQL)
from prisma import Prisma
db = Prisma()
await db.connect()
# Get all racing leagues
racing_leagues = await db.league.find_many(
where={"sport_archetype": "racing", "status": "active"},
include={"Documents": True}
)
for league in racing_leagues:
print(f"{league.league_name} - {league.location}")Example 2: Semantic Search (Pinecone β PostgreSQL)
from pinecone import Pinecone
from openai import OpenAI
# Generate query embedding
openai_client = OpenAI()
query_embedding = openai_client.embeddings.create(
model="text-embedding-3-small",
input="High-speed racing with global viewership"
).data[0].embedding
# Search Pinecone
pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY'))
index = pc.Index("altsports-leagues")
matches = index.query(
vector=query_embedding,
top_k=10,
include_metadata=True
)
# Get full data from PostgreSQL
league_tags = [m['id'] for m in matches['matches']]
leagues = await db.league.find_many(
where={"league_tag": {"in": league_tags}}
)
for league in leagues:
print(f"{league.league_name}: {league.sport_archetype}")Example 3: Graph Traversal (Neo4j)
from neo4j import GraphDatabase
driver = GraphDatabase.driver(
"bolt://localhost:7687",
auth=("neo4j", "password")
)
with driver.session() as session:
# Find all combat sports
result = session.run("""
MATCH (l:League)-[:BELONGS_TO_ARCHETYPE]->(a:SportArchetype {name: 'combat'})
RETURN l.league_name as name, l.tier as tier
ORDER BY l.tier
""")
for record in result:
print(f"{record['name']}: {record['tier']}")Example 4: Similar Leagues (Multi-DB Query)
# 1. Get league from PostgreSQL
ufc = await db.league.find_unique(where={"league_tag": "ufc"})
# 2. Find similar via Pinecone
similar_matches = index.query(
vector=ufc.embedding_vector,
top_k=10,
filter={"sport_archetype": "combat"}
)
# 3. Get archetype siblings from Neo4j
with driver.session() as session:
archetype_siblings = session.run("""
MATCH (ufc:League {league_tag: 'ufc'})-[:BELONGS_TO_ARCHETYPE]->(a)
MATCH (sibling:League)-[:BELONGS_TO_ARCHETYPE]->(a)
WHERE sibling.league_tag <> 'ufc'
RETURN sibling.league_tag as tag, sibling.league_name as name
""")
print("Archetype Siblings:")
for record in archetype_siblings:
print(f" - {record['name']}")
# 4. Combine results
all_similar = set([m['id'] for m in similar_matches['matches']]) | \
set([r['tag'] for r in archetype_siblings])
print(f"\nTotal similar leagues: {len(all_similar)}")π Common Tasks
Add a New League (Manual)
Option 1: Edit JSONL, Re-Seed (Recommended)
# 1. Edit seeds file
echo '{"league_tag": "pfl", "league_name": "PFL", "sport_name": "MMA", "sport_archetype": "combat", "status": "active"}' >> seeds/pfl_league.seed.json
# 2. Re-seed
uv run python scripts/seed_unified_multi_db.py --leagues
# β
Automatically syncs to PostgreSQL, Neo4j, PineconeOption 2: Programmatic (For dynamic data)
from scripts.seed_unified_multi_db import UnifiedSeedManager
manager = UnifiedSeedManager()
await manager.connect_all()
# Seed single league
await manager.seed_single_league({
"league_tag": "pfl",
"league_name": "PFL",
"sport_name": "MMA",
"sport_archetype": "combat",
"status": "active",
"location": "USA"
})
# β
Syncs to all databases
await manager.disconnect_all()Check Sync Status
# Get leagues not yet synced to Neo4j
unsynced = await db.league.find_many(
where={"neo4j_sync_status": {"in": ["pending", "failed"]}}
)
print(f"Leagues pending Neo4j sync: {len(unsynced)}")
# Get failed Pinecone syncs
failed = await db.league.find_many(
where={"pinecone_sync_status": "failed"}
)
for league in failed:
print(f"β {league.league_name}")Re-Sync a Single League
# Re-generate embedding and sync
league = await db.league.find_unique(where={"league_tag": "formula_1"})
manager = UnifiedSeedManager()
await manager.connect_all()
# This will update embedding and re-sync
await manager.seed_single_league({
"league_tag": league.league_tag,
"league_name": league.league_name,
"sport_archetype": league.sport_archetype,
# ... other fields
})
await manager.disconnect_all()Batch Update Embeddings
# Find leagues without embeddings
no_embedding = await db.league.find_many(
where={"embedding_vector": None, "status": "active"}
)
manager = UnifiedSeedManager()
await manager.connect_all()
for league in no_embedding:
# Generate embedding
text = manager.create_league_embedding_text(league.dict())
embedding = manager.generate_embedding(text)
# Update PostgreSQL
await db.league.update(
where={"id": league.id},
data={
"embedding_vector": embedding,
"embedding_model": "text-embedding-3-small",
"embedding_updated_at": datetime.utcnow(),
"pinecone_sync_status": "pending"
}
)
# Sync to Pinecone
if embedding:
await manager._sync_league_to_pinecone(league, embedding, {})
print(f"β
Updated {len(no_embedding)} leagues")π Debugging
Check Neo4j Connection
# Via cypher-shell
cypher-shell -u neo4j -p password
# Test query
MATCH (a:SportArchetype) RETURN a.name, a.display_name;Check Pinecone Indexes
from pinecone import Pinecone
pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY'))
indexes = pc.list_indexes()
print("Available indexes:")
for idx in indexes:
print(f" - {idx.name}: {idx.dimension}d")
# Check vector count
index = pc.Index("altsports-leagues")
stats = index.describe_index_stats()
print(f"\nVectors in altsports-leagues: {stats['total_vector_count']}")Check PostgreSQL Sync Status
from prisma import Prisma
db = Prisma()
await db.connect()
# Summary of sync status
sync_summary = await db.league.group_by(
by=["neo4j_sync_status", "pinecone_sync_status"],
count=True
)
print("Sync Status Summary:")
for status in sync_summary:
print(f" Neo4j: {status['neo4j_sync_status']}, "
f"Pinecone: {status['pinecone_sync_status']}, "
f"Count: {status['_count']}")π Learning Path
Day 1: PostgreSQL Queries
- Start with simple Prisma queries
- Master filtering, includes, and ordering
- Learn about the sync status fields
Day 2: Vector Search
- Generate embeddings for test queries
- Query Pinecone with different top_k values
- Combine Pinecone results with PostgreSQL data
Day 3: Graph Patterns
- Write basic Neo4j MATCH queries
- Explore archetype relationships
- Try pattern matching (e.g., find leagues without contracts)
Day 4: Multi-DB Queries
- Combine PostgreSQL + Pinecone
- Combine Neo4j + PostgreSQL
- Build end-to-end semantic search with graph traversal
Day 5: Advanced Features
- Implement similarity relationship detection
- Build sync health dashboard
- Create custom embedding update workflows
π¨ Common Issues
Issue: Prisma client not generated
cd database/schemas/prisma
uv run prisma generate --schema=schema.v2.enhanced.prismaIssue: Neo4j connection refused
# Check Neo4j is running
neo4j status
# Or use Docker
docker run -d --name neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password \
neo4j:latestIssue: Pinecone index not found
# Indexes are auto-created on first seed
# Or create manually:
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY'))
pc.create_index(
name="altsports-leagues",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)Issue: Embedding generation fails
# Check OpenAI API key
import os
from openai import OpenAI
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
test = client.embeddings.create(
model="text-embedding-3-small",
input="test"
)
print(f"β
OpenAI working: {len(test.data[0].embedding)}d embedding")π Next Steps
- β Complete this quickstart
- π Read
MULTI_DATABASE_ARCHITECTURE.mdfor deep dive - π¨ Build your first multi-DB query
- π― Add custom archetype relationships
- π Deploy to production!
Questions? Check the full architecture guide or the existing seed.examples-kb/README.md for the proven JSONL β Prisma pattern this extends! π