Source: data_layer/docs/RETRIEVAL_SYSTEM_README.md
Knowledge Retrieval System
Overview
This system implements a retrieval-first architecture that prioritizes compression, storage, and retrieval over generation. Instead of regenerating content from scratch each time, we store successful outputs and retrieve modular, reusable components.
Philosophy
OLD: Query β Generate from scratch β Return (30s, inconsistent)
NEW: Query β Embed β Match β Retrieve β Compose (3s, consistent, learning)Core Principles
- Store Every Success: Contracts, responses, prompts that work well
- Embed Immediately: Generate vector embeddings when storing content
- Triple-Point Index: Link entities β metadata β embeddings
- Generate Minimally: Only generate what can't be retrieved/composed
- Close the Loop: Capture feedback to improve retrieval quality
Architecture
Three-Tier System
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Query Interface β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Query Engine (Multi-Modal) β
β β’ Semantic search (embeddings) β
β β’ Graph queries (relationships) β
β β’ Metadata filters β
β β’ Hybrid scoring β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββ΄βββββββββββββββββββββ
β β
ββββββββββββββββββββββ ββββββββββββββββββββββ
β Vector Index β β Triple Store β
β (Embeddings) β ββββββββββββ β (Relationships) β
β β β β
β β’ Chroma/FAISS β β β’ Entities β
β β’ Semantic search β β β’ Metadata β
β β’ Fast similarity β β β’ Graph queries β
ββββββββββββββββββββββ ββββββββββββββββββββββDirectory Structure
knowledge/
βββ embeddings/ # Vector embedding service
β βββ __init__.py
β βββ config.py # Model configurations
β βββ service.py # Embedding generation
β βββ index.py # Vector similarity search (Chroma/FAISS)
β
βββ index/ # Triple-point relationship storage
β βββ __init__.py
β βββ triple_store.py # Entity & relationship management
β βββ query_engine.py # Multi-modal querying
β βββ update_service.py # Incremental updates & feedback loops
β
βββ examples/ # Few-shot examples (JSONL + embeddings)
β βββ data/ # JSONL storage
β βββ retriever.py # Existing retrieval logic
β βββ matcher.py # Similarity matching
β βββ cache.py # LRU caching
β
βββ schemas/ # Schema definitions
βββ templates/ # Reusable modular components
βββ RETRIEVAL_SYSTEM_README.md # This fileComponents
1. Embedding Service (embeddings/)
Generates vector embeddings for semantic similarity search.
Supported Models:
- OpenAI: ada-002, text-embedding-3-small, text-embedding-3-large
- Local (free): all-MiniLM-L6-v2, all-mpnet-base-v2
- Cohere: embed-english-v3.0
- Google: textembedding-gecko
Quick Start:
from knowledge.embeddings import EmbeddingService, EmbeddingConfig, ModelType
# Fast local embedding (free)
config = EmbeddingConfig.from_model_type(ModelType.MINILM_L6)
service = EmbeddingService(config)
await service.initialize()
embedding = await service.embed("Premium basketball league contract")
# Returns: Embedding(text="...", vector=np.array([...]), dimensions=384)Production Setup:
# OpenAI (good quality + speed balance)
config = EmbeddingConfig.production(api_key="sk-...")
service = EmbeddingService(config)
# Batch processing
embeddings = await service.embed([
"Contract template for tier 1 leagues",
"Response template for partnership inquiries",
"Onboarding workflow for new leagues"
])2. Vector Index (embeddings/index.py)
Fast similarity search using Chroma or FAISS.
Features:
- Semantic search via vector similarity
- Metadata filtering
- Persistent storage
- Both Chroma (easy) and FAISS (fast) backends
Quick Start:
from knowledge.embeddings import EmbeddingService, VectorIndex
service = EmbeddingService()
await service.initialize()
index = VectorIndex(service, backend="chroma")
await index.initialize()
# Add documents
await index.add(
texts=["Premium basketball contract", "Standard soccer template"],
ids=["contract_001", "template_002"],
metadatas=[
{"tier": "premium", "sport": "basketball"},
{"tier": "standard", "sport": "soccer"}
]
)
# Search
results = await index.search(
query="high-tier basketball agreement",
limit=5,
min_score=0.7,
filters={"sport": "basketball"}
)
for result in results:
print(f"{result.id}: {result.score:.2f} - {result.text[:50]}")3. Triple Store (index/triple_store.py)
Manages entity relationships and metadata.
Entity Types:
- Contract, Template, Prompt, Example, League, Response, Document
Relationship Types:
- based_on, variant_of, used_in, similar_to, contains, derived_from, supersedes
Quick Start:
from knowledge.index import TripleStore, Entity, Relationship, EntityType, RelationType
store = TripleStore()
# Add entities
contract = Entity(
id="contract_premium_basketball_001",
type=EntityType.CONTRACT,
name="Premium Basketball League Contract",
content_ref="knowledge/examples/data/contracts.jsonl#47",
embedding_ref="emb_contract_prem_bball_001",
metadata={
"tier": "premium",
"sport": "basketball",
"success_rate": 0.92,
"usage_count": 47
}
)
store.add_entity(contract)
# Add relationships
template = Entity(
id="template_basketball_base",
type=EntityType.TEMPLATE,
name="Basketball Contract Template",
content_ref="knowledge/templates/basketball_contract.json"
)
store.add_entity(template)
rel = Relationship(
source_id="contract_premium_basketball_001",
target_id="template_basketball_base",
type=RelationType.BASED_ON,
weight=1.0
)
store.add_relationship(rel)
# Query relationships
related = store.get_related_entities(
entity_id="contract_premium_basketball_001",
rel_type=RelationType.BASED_ON,
max_hops=2
)Usage Patterns
Pattern 1: Contract Generation β Retrieval + Composition
Before (Generation):
def generate_contract(league_data):
prompt = build_prompt(league_data) # 2s - regenerate
sections = llm.generate(prompt) # 25s - LLM call
contract = assemble(sections) # 3s - assembly
return contract # Total: 30sAfter (Retrieval):
async def retrieve_and_compose_contract(league_data):
# 1. Find similar successful contracts (0.5s)
similar = await vector_index.search(
query=league_data.semantic_description,
filters={"tier": league_data.tier, "sport": league_data.sport},
limit=3
)
# 2. Get relationships (0.2s)
base_entity_id = similar[0].metadata['entity_id']
related = triple_store.get_related_entities(
entity_id=base_entity_id,
rel_type=RelationType.BASED_ON
)
# 3. Compose from modules (2s)
contract = composer.assemble(
base_template=similar[0].text,
customizations=league_data.specific_terms,
generate_only=["custom_clauses"] # Minimal LLM use
)
# 4. Store for future retrieval (0.3s)
await store_successful_contract(contract, league_data)
return contract # Total: 3sPattern 2: Response Generation β Template Retrieval
Before:
def generate_response(email):
classification = classify(email) # 3s
prompt = build_response_prompt(email) # 1s
response = llm.generate(prompt) # 8s
return response # Total: 12sAfter:
async def retrieve_response_template(email):
# Embed email content
email_embedding = await embedding_service.embed(email.body)
# Find similar past responses (0.5s)
similar_responses = await vector_index.search(
query=email_embedding,
filters={"category": email.category, "tier": email.tier},
limit=3
)
# Adapt template with minimal generation (1s)
response = adapt_template(
template=similar_responses[0].text,
context={"league_name": email.league_name}
)
return response # Total: 1.5sPattern 3: Feedback Loop β Continuous Learning
async def store_with_feedback(content, entity_type, metadata, feedback):
"""Store successful outputs and update quality scores"""
# 1. Generate embedding
embedding = await embedding_service.embed(content)
# 2. Create entity
entity = Entity(
id=f"{entity_type}_{generate_id()}",
type=entity_type,
name=metadata.get('name', 'Unnamed'),
content_ref=f"storage/{entity_type}s.jsonl#{get_line_number()}",
embedding_ref=f"emb_{entity_type}_{generate_id()}",
metadata={
**metadata,
"quality_score": calculate_quality(feedback),
"created_at": datetime.now().isoformat()
}
)
triple_store.add_entity(entity)
# 3. Add to vector index
await vector_index.add(
texts=[content],
ids=[entity.id],
metadatas=[entity.metadata],
embeddings=[embedding.vector]
)
# 4. Create relationships to similar entities
similar = await vector_index.search(content, limit=5, min_score=0.8)
for result in similar[1:]: # Skip self
if result.score > 0.85:
rel = Relationship(
source_id=entity.id,
target_id=result.id,
type=RelationType.SIMILAR_TO,
weight=result.score
)
triple_store.add_relationship(rel)
return entityPerformance Benchmarks
| Operation | Before (Generation) | After (Retrieval) | Speedup |
|---|---|---|---|
| Contract generation | 30s | 3s | 10x |
| Response generation | 12s | 1.5s | 8x |
| Prompt building | 5s | 0.5s | 10x |
| Consistency | Variable | High | Quality β |
| Learning ability | None | Continuous | Intelligence β |
Setup & Installation
Dependencies
# Required
pip install numpy
# Vector stores (choose one)
pip install chromadb # Recommended: easy, Python-native
pip install faiss-cpu # Alternative: faster, lower-level
# Embedding models (choose based on needs)
pip install openai # For OpenAI embeddings
pip install sentence-transformers # For free local embeddings
pip install cohere # For Cohere embeddingsQuick Setup
# 1. Initialize services
from knowledge.embeddings import EmbeddingService, VectorIndex, EmbeddingConfig
from knowledge.index import TripleStore
# Use free local model
config = EmbeddingConfig.default()
embedding_service = EmbeddingService(config)
await embedding_service.initialize()
vector_index = VectorIndex(embedding_service, backend="chroma")
await vector_index.initialize()
triple_store = TripleStore()
# 2. Ready to use!Production Setup
import os
# Use OpenAI for production quality
config = EmbeddingConfig.production(api_key=os.getenv("OPENAI_API_KEY"))
embedding_service = EmbeddingService(config)
await embedding_service.initialize()
vector_index = VectorIndex(
embedding_service,
backend="chroma",
persist_directory=Path("/data/vector_store")
)
await vector_index.initialize()
triple_store = TripleStore(storage_path=Path("/data/triple_store"))Migration Guide
Converting Existing Systems
Step 1: Identify Generation Points
- Find where you call LLMs to generate content from scratch
- Look for
llm.generate(),openai.chat.completions.create(), etc.
Step 2: Extract & Store Successful Outputs
# When a generated contract is approved:
await store_with_feedback(
content=approved_contract,
entity_type=EntityType.CONTRACT,
metadata={"tier": "premium", "sport": "basketball"},
feedback="approved"
)Step 3: Replace Generation with Retrieval
# Instead of: contract = llm.generate(prompt)
similar = await vector_index.search(
query=league_description,
filters={"tier": tier, "sport": sport}
)
contract = composer.assemble(similar[0], customizations)Best Practices
β DO
- Store every success - When something works, save it
- Embed immediately - Generate embeddings when storing
- Track relationships - Link related entities
- Capture feedback - Update quality scores based on outcomes
- Generate minimally - Only generate truly novel content
β DON'T
- Don't regenerate - If it exists, retrieve it
- Don't skip metadata - Rich metadata enables better filtering
- Don't ignore failures - Store failures with low quality scores to avoid repeating
- Don't forget relationships - Graph structure adds intelligence
- Don't hard-code - Make everything configurable and retrieval-based
Troubleshooting
Slow Searches
- Check
min_scorethreshold (lower = more results but slower) - Use metadata filters to narrow search space
- Consider FAISS backend for production (faster than Chroma)
- Reduce
limitparameter
Poor Quality Retrievals
- Check embedding model quality (try OpenAI 3-large)
- Ensure metadata is rich and accurate
- Increase training data (store more examples)
- Adjust similarity thresholds
Memory Issues
- Use FAISS with quantization for large datasets
- Implement pagination in search results
- Clear embedding cache periodically:
embedding_service.clear_cache()
Future Enhancements
LangMem Integration
LangMem is essentially: JSON triple-point index + embedding spaces + memory management.
# Future integration
from langmem import MemoryStore
# LangMem would wrap our existing components
memory = MemoryStore(
triple_store=our_triple_store,
vector_index=our_vector_index,
embedding_service=our_embedding_service
)Graph Neural Networks
Enhance relationship scoring with GNNs:
# Use graph structure for better ranking
results = await query_engine.search_with_gnn(
query=query,
use_graph_context=True,
graph_weight=0.3 # Balance semantic + graph signals
)Contributing
When adding new features:
- Maintain retrieval-first philosophy
- Add embeddings for all new content types
- Define entity and relationship types
- Include usage examples
- Update this README
Support
For questions or issues:
- Check existing examples in
knowledge/examples/ - Review test files
- See main database CLAUDE.md for architecture overview