Architecture
Knowledge Retrieval System

Source: data_layer/docs/RETRIEVAL_SYSTEM_README.md

Knowledge Retrieval System

Overview

This system implements a retrieval-first architecture that prioritizes compression, storage, and retrieval over generation. Instead of regenerating content from scratch each time, we store successful outputs and retrieve modular, reusable components.

Philosophy

OLD: Query β†’ Generate from scratch β†’ Return (30s, inconsistent)
NEW: Query β†’ Embed β†’ Match β†’ Retrieve β†’ Compose (3s, consistent, learning)

Core Principles

  1. Store Every Success: Contracts, responses, prompts that work well
  2. Embed Immediately: Generate vector embeddings when storing content
  3. Triple-Point Index: Link entities ↔ metadata ↔ embeddings
  4. Generate Minimally: Only generate what can't be retrieved/composed
  5. Close the Loop: Capture feedback to improve retrieval quality

Architecture

Three-Tier System

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Query Interface                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Query Engine (Multi-Modal)                 β”‚
β”‚  β€’ Semantic search (embeddings)                             β”‚
β”‚  β€’ Graph queries (relationships)                            β”‚
β”‚  β€’ Metadata filters                                         β”‚
β”‚  β€’ Hybrid scoring                                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             ↓
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        ↓                                         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Vector Index      β”‚                  β”‚  Triple Store      β”‚
β”‚  (Embeddings)      β”‚  ←──────────→   β”‚  (Relationships)   β”‚
β”‚                    β”‚                  β”‚                    β”‚
β”‚  β€’ Chroma/FAISS    β”‚                  β”‚  β€’ Entities        β”‚
β”‚  β€’ Semantic search β”‚                  β”‚  β€’ Metadata        β”‚
β”‚  β€’ Fast similarity β”‚                  β”‚  β€’ Graph queries   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Directory Structure

knowledge/
β”œβ”€β”€ embeddings/              # Vector embedding service
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ config.py           # Model configurations
β”‚   β”œβ”€β”€ service.py          # Embedding generation
β”‚   └── index.py            # Vector similarity search (Chroma/FAISS)
β”‚
β”œβ”€β”€ index/                   # Triple-point relationship storage
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ triple_store.py     # Entity & relationship management
β”‚   β”œβ”€β”€ query_engine.py     # Multi-modal querying
β”‚   └── update_service.py   # Incremental updates & feedback loops
β”‚
β”œβ”€β”€ examples/                # Few-shot examples (JSONL + embeddings)
β”‚   β”œβ”€β”€ data/               # JSONL storage
β”‚   β”œβ”€β”€ retriever.py        # Existing retrieval logic
β”‚   β”œβ”€β”€ matcher.py          # Similarity matching
β”‚   └── cache.py            # LRU caching
β”‚
β”œβ”€β”€ schemas/                 # Schema definitions
β”œβ”€β”€ templates/               # Reusable modular components
└── RETRIEVAL_SYSTEM_README.md  # This file

Components

1. Embedding Service (embeddings/)

Generates vector embeddings for semantic similarity search.

Supported Models:

  • OpenAI: ada-002, text-embedding-3-small, text-embedding-3-large
  • Local (free): all-MiniLM-L6-v2, all-mpnet-base-v2
  • Cohere: embed-english-v3.0
  • Google: textembedding-gecko

Quick Start:

from knowledge.embeddings import EmbeddingService, EmbeddingConfig, ModelType
 
# Fast local embedding (free)
config = EmbeddingConfig.from_model_type(ModelType.MINILM_L6)
service = EmbeddingService(config)
await service.initialize()
 
embedding = await service.embed("Premium basketball league contract")
# Returns: Embedding(text="...", vector=np.array([...]), dimensions=384)

Production Setup:

# OpenAI (good quality + speed balance)
config = EmbeddingConfig.production(api_key="sk-...")
service = EmbeddingService(config)
 
# Batch processing
embeddings = await service.embed([
    "Contract template for tier 1 leagues",
    "Response template for partnership inquiries",
    "Onboarding workflow for new leagues"
])

2. Vector Index (embeddings/index.py)

Fast similarity search using Chroma or FAISS.

Features:

  • Semantic search via vector similarity
  • Metadata filtering
  • Persistent storage
  • Both Chroma (easy) and FAISS (fast) backends

Quick Start:

from knowledge.embeddings import EmbeddingService, VectorIndex
 
service = EmbeddingService()
await service.initialize()
 
index = VectorIndex(service, backend="chroma")
await index.initialize()
 
# Add documents
await index.add(
    texts=["Premium basketball contract", "Standard soccer template"],
    ids=["contract_001", "template_002"],
    metadatas=[
        {"tier": "premium", "sport": "basketball"},
        {"tier": "standard", "sport": "soccer"}
    ]
)
 
# Search
results = await index.search(
    query="high-tier basketball agreement",
    limit=5,
    min_score=0.7,
    filters={"sport": "basketball"}
)
 
for result in results:
    print(f"{result.id}: {result.score:.2f} - {result.text[:50]}")

3. Triple Store (index/triple_store.py)

Manages entity relationships and metadata.

Entity Types:

  • Contract, Template, Prompt, Example, League, Response, Document

Relationship Types:

  • based_on, variant_of, used_in, similar_to, contains, derived_from, supersedes

Quick Start:

from knowledge.index import TripleStore, Entity, Relationship, EntityType, RelationType
 
store = TripleStore()
 
# Add entities
contract = Entity(
    id="contract_premium_basketball_001",
    type=EntityType.CONTRACT,
    name="Premium Basketball League Contract",
    content_ref="knowledge/examples/data/contracts.jsonl#47",
    embedding_ref="emb_contract_prem_bball_001",
    metadata={
        "tier": "premium",
        "sport": "basketball",
        "success_rate": 0.92,
        "usage_count": 47
    }
)
store.add_entity(contract)
 
# Add relationships
template = Entity(
    id="template_basketball_base",
    type=EntityType.TEMPLATE,
    name="Basketball Contract Template",
    content_ref="knowledge/templates/basketball_contract.json"
)
store.add_entity(template)
 
rel = Relationship(
    source_id="contract_premium_basketball_001",
    target_id="template_basketball_base",
    type=RelationType.BASED_ON,
    weight=1.0
)
store.add_relationship(rel)
 
# Query relationships
related = store.get_related_entities(
    entity_id="contract_premium_basketball_001",
    rel_type=RelationType.BASED_ON,
    max_hops=2
)

Usage Patterns

Pattern 1: Contract Generation β†’ Retrieval + Composition

Before (Generation):

def generate_contract(league_data):
    prompt = build_prompt(league_data)        # 2s - regenerate
    sections = llm.generate(prompt)           # 25s - LLM call
    contract = assemble(sections)             # 3s - assembly
    return contract                           # Total: 30s

After (Retrieval):

async def retrieve_and_compose_contract(league_data):
    # 1. Find similar successful contracts (0.5s)
    similar = await vector_index.search(
        query=league_data.semantic_description,
        filters={"tier": league_data.tier, "sport": league_data.sport},
        limit=3
    )
 
    # 2. Get relationships (0.2s)
    base_entity_id = similar[0].metadata['entity_id']
    related = triple_store.get_related_entities(
        entity_id=base_entity_id,
        rel_type=RelationType.BASED_ON
    )
 
    # 3. Compose from modules (2s)
    contract = composer.assemble(
        base_template=similar[0].text,
        customizations=league_data.specific_terms,
        generate_only=["custom_clauses"]  # Minimal LLM use
    )
 
    # 4. Store for future retrieval (0.3s)
    await store_successful_contract(contract, league_data)
 
    return contract  # Total: 3s

Pattern 2: Response Generation β†’ Template Retrieval

Before:

def generate_response(email):
    classification = classify(email)      # 3s
    prompt = build_response_prompt(email) # 1s
    response = llm.generate(prompt)       # 8s
    return response                       # Total: 12s

After:

async def retrieve_response_template(email):
    # Embed email content
    email_embedding = await embedding_service.embed(email.body)
 
    # Find similar past responses (0.5s)
    similar_responses = await vector_index.search(
        query=email_embedding,
        filters={"category": email.category, "tier": email.tier},
        limit=3
    )
 
    # Adapt template with minimal generation (1s)
    response = adapt_template(
        template=similar_responses[0].text,
        context={"league_name": email.league_name}
    )
 
    return response  # Total: 1.5s

Pattern 3: Feedback Loop β†’ Continuous Learning

async def store_with_feedback(content, entity_type, metadata, feedback):
    """Store successful outputs and update quality scores"""
 
    # 1. Generate embedding
    embedding = await embedding_service.embed(content)
 
    # 2. Create entity
    entity = Entity(
        id=f"{entity_type}_{generate_id()}",
        type=entity_type,
        name=metadata.get('name', 'Unnamed'),
        content_ref=f"storage/{entity_type}s.jsonl#{get_line_number()}",
        embedding_ref=f"emb_{entity_type}_{generate_id()}",
        metadata={
            **metadata,
            "quality_score": calculate_quality(feedback),
            "created_at": datetime.now().isoformat()
        }
    )
    triple_store.add_entity(entity)
 
    # 3. Add to vector index
    await vector_index.add(
        texts=[content],
        ids=[entity.id],
        metadatas=[entity.metadata],
        embeddings=[embedding.vector]
    )
 
    # 4. Create relationships to similar entities
    similar = await vector_index.search(content, limit=5, min_score=0.8)
    for result in similar[1:]:  # Skip self
        if result.score > 0.85:
            rel = Relationship(
                source_id=entity.id,
                target_id=result.id,
                type=RelationType.SIMILAR_TO,
                weight=result.score
            )
            triple_store.add_relationship(rel)
 
    return entity

Performance Benchmarks

OperationBefore (Generation)After (Retrieval)Speedup
Contract generation30s3s10x
Response generation12s1.5s8x
Prompt building5s0.5s10x
ConsistencyVariableHighQuality ↑
Learning abilityNoneContinuousIntelligence ↑

Setup & Installation

Dependencies

# Required
pip install numpy
 
# Vector stores (choose one)
pip install chromadb          # Recommended: easy, Python-native
pip install faiss-cpu         # Alternative: faster, lower-level
 
# Embedding models (choose based on needs)
pip install openai            # For OpenAI embeddings
pip install sentence-transformers  # For free local embeddings
pip install cohere            # For Cohere embeddings

Quick Setup

# 1. Initialize services
from knowledge.embeddings import EmbeddingService, VectorIndex, EmbeddingConfig
from knowledge.index import TripleStore
 
# Use free local model
config = EmbeddingConfig.default()
embedding_service = EmbeddingService(config)
await embedding_service.initialize()
 
vector_index = VectorIndex(embedding_service, backend="chroma")
await vector_index.initialize()
 
triple_store = TripleStore()
 
# 2. Ready to use!

Production Setup

import os
 
# Use OpenAI for production quality
config = EmbeddingConfig.production(api_key=os.getenv("OPENAI_API_KEY"))
embedding_service = EmbeddingService(config)
await embedding_service.initialize()
 
vector_index = VectorIndex(
    embedding_service,
    backend="chroma",
    persist_directory=Path("/data/vector_store")
)
await vector_index.initialize()
 
triple_store = TripleStore(storage_path=Path("/data/triple_store"))

Migration Guide

Converting Existing Systems

Step 1: Identify Generation Points

  • Find where you call LLMs to generate content from scratch
  • Look for llm.generate(), openai.chat.completions.create(), etc.

Step 2: Extract & Store Successful Outputs

# When a generated contract is approved:
await store_with_feedback(
    content=approved_contract,
    entity_type=EntityType.CONTRACT,
    metadata={"tier": "premium", "sport": "basketball"},
    feedback="approved"
)

Step 3: Replace Generation with Retrieval

# Instead of: contract = llm.generate(prompt)
similar = await vector_index.search(
    query=league_description,
    filters={"tier": tier, "sport": sport}
)
contract = composer.assemble(similar[0], customizations)

Best Practices

βœ… DO

  1. Store every success - When something works, save it
  2. Embed immediately - Generate embeddings when storing
  3. Track relationships - Link related entities
  4. Capture feedback - Update quality scores based on outcomes
  5. Generate minimally - Only generate truly novel content

❌ DON'T

  1. Don't regenerate - If it exists, retrieve it
  2. Don't skip metadata - Rich metadata enables better filtering
  3. Don't ignore failures - Store failures with low quality scores to avoid repeating
  4. Don't forget relationships - Graph structure adds intelligence
  5. Don't hard-code - Make everything configurable and retrieval-based

Troubleshooting

Slow Searches

  • Check min_score threshold (lower = more results but slower)
  • Use metadata filters to narrow search space
  • Consider FAISS backend for production (faster than Chroma)
  • Reduce limit parameter

Poor Quality Retrievals

  • Check embedding model quality (try OpenAI 3-large)
  • Ensure metadata is rich and accurate
  • Increase training data (store more examples)
  • Adjust similarity thresholds

Memory Issues

  • Use FAISS with quantization for large datasets
  • Implement pagination in search results
  • Clear embedding cache periodically: embedding_service.clear_cache()

Future Enhancements

LangMem Integration

LangMem is essentially: JSON triple-point index + embedding spaces + memory management.

# Future integration
from langmem import MemoryStore
 
# LangMem would wrap our existing components
memory = MemoryStore(
    triple_store=our_triple_store,
    vector_index=our_vector_index,
    embedding_service=our_embedding_service
)

Graph Neural Networks

Enhance relationship scoring with GNNs:

# Use graph structure for better ranking
results = await query_engine.search_with_gnn(
    query=query,
    use_graph_context=True,
    graph_weight=0.3  # Balance semantic + graph signals
)

Contributing

When adding new features:

  1. Maintain retrieval-first philosophy
  2. Add embeddings for all new content types
  3. Define entity and relationship types
  4. Include usage examples
  5. Update this README

Support

For questions or issues:

  • Check existing examples in knowledge/examples/
  • Review test files
  • See main database CLAUDE.md for architecture overview

Platform

Documentation

Community

Support

partnership@altsportsdata.comdev@altsportsleagues.ai

2025 Β© AltSportsLeagues.ai. Powered by AI-driven sports business intelligence.

πŸ€– AI-Enhancedβ€’πŸ“Š Data-Drivenβ€’βš‘ Real-Time