Source: data_layer/docs/CONSOLIDATION_PLAN_REVISED.md
Examples Consolidation Plan (REVISED)
π― Goal
Move few-shot examples infrastructure to proper service location (apps/backend/services/) and consolidate all example data under database/output-styles/examples/.
π Revised Architecture
β
Data Layer: database/output-styles/examples/
database/output-styles/examples/
βββ README.md # Master documentation
βββ examples_index.json # Master index (existing)
βββ validation-framework.json # Validation rules (existing)
β
βββ by-scenario/ # Structured JSON (existing)
β βββ triage.json
β βββ response-generation.json
β βββ pdf-processing.json
β βββ contract-generation.json
β βββ onboarding.json
β βββ workflow-chain.json
β
βββ edge-cases/ # Edge case examples (existing)
β βββ edge-cases.json
β
βββ seeds/ # π JSONL files for DB seeding
β βββ README.md
β βββ triage.jsonl
β βββ contract-generation.jsonl
β βββ pdf-processing.jsonl
β βββ response-generation.jsonl
β βββ onboarding-response.jsonl
β βββ league_examples.jsonl
β βββ questionnaires.jsonl
β βββ schema_definitions.jsonl
β
βββ embeddings/ # π Vector embeddings
βββ README.md
βββ triage_embeddings.npy
βββ contract_embeddings.npy
βββ metadata.jsonβ
Service Layer: apps/backend/services/few_shot_examples/
apps/backend/services/few_shot_examples/
βββ __init__.py # Service exports
βββ README.md # Service documentation
β
βββ api.py # High-level API interface
βββ retriever.py # Core retrieval logic
βββ matcher.py # Semantic similarity matching
βββ cache.py # LRU caching system
βββ example_manager.py # JSONL utilities
β
βββ config.py # Configuration (paths, settings)
βββ models.py # Pydantic models
β
βββ tests/
βββ test_api.py
βββ test_retriever.py
βββ test_matcher.py
βββ test_cache.pyπ Migration Steps
Phase 1: Create Service Structure β
# Create service directory
mkdir -p apps/backend/services/few_shot_examples/tests
# Create __init__.py files
touch apps/backend/services/few_shot_examples/__init__.py
touch apps/backend/services/few_shot_examples/tests/__init__.pyPhase 2: Move Infrastructure Code β
# Move Python modules
mv database/few_shot_examples_training_data/api.py \
apps/backend/services/few_shot_examples/
mv database/few_shot_examples_training_data/retriever.py \
apps/backend/services/few_shot_examples/
mv database/few_shot_examples_training_data/matcher.py \
apps/backend/services/few_shot_examples/
mv database/few_shot_examples_training_data/cache.py \
apps/backend/services/few_shot_examples/
mv database/few_shot_examples_training_data/example_manager.py \
apps/backend/services/few_shot_examples/
# Move tests if they exist
if [ -f database/few_shot_examples_training_data/test_*.py ]; then
mv database/few_shot_examples_training_data/test_*.py \
apps/backend/services/few_shot_examples/tests/
fiPhase 3: Create Data Structure β
# Create seeds and embeddings directories
mkdir -p database/output-styles/examples/seeds
mkdir -p database/output-styles/examples/embeddings
# Move JSONL files
mv database/few_shot_examples_training_data/data/*.jsonl \
database/output-styles/examples/seeds/
# Move subdirectories if needed
if [ -d database/few_shot_examples_training_data/data/contract_sections ]; then
mv database/few_shot_examples_training_data/data/contract_sections \
database/output-styles/examples/seeds/
fiPhase 4: Create Configuration β
# Create config.py in service
cat > apps/backend/services/few_shot_examples/config.py << 'EOF'
"""Configuration for Few-Shot Examples Service."""
from pathlib import Path
from typing import Optional
from pydantic_settings import BaseSettings
class FewShotConfig(BaseSettings):
"""Configuration for Few-Shot Examples Service."""
# Data paths (relative to project root)
EXAMPLES_ROOT: Path = Path("database/output-styles/examples")
SEEDS_DIR: Path = EXAMPLES_ROOT / "seeds"
STRUCTURED_DIR: Path = EXAMPLES_ROOT / "by-scenario"
EDGE_CASES_DIR: Path = EXAMPLES_ROOT / "edge-cases"
EMBEDDINGS_DIR: Path = EXAMPLES_ROOT / "embeddings"
# Cache configuration
CACHE_MAX_SIZE: int = 2000
CACHE_TTL_SECONDS: int = 7200 # 2 hours
# Retrieval configuration
DEFAULT_MAX_EXAMPLES: int = 5
DEFAULT_QUALITY_THRESHOLD: float = 0.80
# Database configuration
DATABASE_URL: Optional[str] = None
class Config:
env_prefix = "FEW_SHOT_"
case_sensitive = False
# Global config instance
config = FewShotConfig()
EOFPhase 5: Update Service Code β
# Update imports in each file
# In api.py:
from apps.backend.services.few_shot_examples.config import config
from apps.backend.services.few_shot_examples.retriever import FewShotRetriever
from apps.backend.services.few_shot_examples.cache import ExampleCache
# In retriever.py:
from apps.backend.services.few_shot_examples.config import config
from apps.backend.services.few_shot_examples.matcher import SemanticMatcher
from apps.backend.services.few_shot_examples.cache import ExampleCache
# In example_manager.py:
from apps.backend.services.few_shot_examples.config import config
# Update DEFAULT_DATA_DIR:
DEFAULT_DATA_DIR = config.SEEDS_DIRPhase 6: Update Service init.py β
# apps/backend/services/few_shot_examples/__init__.py
"""Few-Shot Examples Service.
Provides intelligent retrieval of few-shot examples for prompt engineering.
Usage:
from apps.backend.services.few_shot_examples import FewShotExamplesAPI
api = FewShotExamplesAPI()
examples = await api.get_examples_for_prompt(
prompt_text="partnership inquiry",
prompt_type="triage",
max_examples=5
)
"""
from .api import FewShotExamplesAPI
from .retriever import FewShotRetriever, RetrievalContext, RetrievalStrategy
from .matcher import SemanticMatcher
from .cache import ExampleCache
from .example_manager import ExampleManager
from .config import config, FewShotConfig
__all__ = [
"FewShotExamplesAPI",
"FewShotRetriever",
"RetrievalContext",
"RetrievalStrategy",
"SemanticMatcher",
"ExampleCache",
"ExampleManager",
"config",
"FewShotConfig",
]Phase 7: Update Scripts β
# In scripts/seed.examples.py
from apps.backend.services.few_shot_examples import ExampleManager, config
# Update paths
JSONL_DIR = config.SEEDS_DIRPhase 8: Update Existing Imports β
# Find all files that import from old location
grep -r "from database.few_shot_examples_training_data" apps/backend/
# Update each import:
# OLD:
from database.few_shot_examples_training_data import FewShotExamplesAPI
# NEW:
from apps.backend.services.few_shot_examples import FewShotExamplesAPIPhase 9: Clean Up Old Structure β
# Remove old directory (after verifying migration)
rm -rf database/few_shot_examples_training_data/
# Update .gitignore if needed
echo "database/output-styles/examples/embeddings/*.npy" >> .gitignoreπ Create Documentation
Service README
# Few-Shot Examples Service
## Location
`apps/backend/services/few_shot_examples/`
## Purpose
Provides intelligent retrieval of few-shot examples for prompt engineering across the AltSports Data platform.
## Architecture
### Service Layer (this directory)
- **API**: High-level interface for example retrieval
- **Retriever**: Core retrieval logic with multiple strategies
- **Matcher**: Semantic similarity matching
- **Cache**: LRU caching for performance
- **Example Manager**: JSONL file utilities
### Data Layer (`database/output-styles/examples/`)
- **seeds/**: JSONL files for database seeding
- **by-scenario/**: Structured JSON examples
- **edge-cases/**: Edge case handling
- **embeddings/**: Vector embeddings
## Usage
```python
from apps.backend.services.few_shot_examples import FewShotExamplesAPI
api = FewShotExamplesAPI()
# Get examples for a prompt
examples = await api.get_examples_for_prompt(
prompt_text="partnership inquiry from premium soccer league",
prompt_type="triage",
business_tier="premium",
sport_type="soccer",
max_examples=3
)
# Direct Prisma query
from prisma import Prisma
db = Prisma()
await db.connect()
examples = await db.fewshotexample.find_many(
where={"category": "triage", "tier": "premium"},
order_by={"quality_score": "desc"},
take=5
)Configuration
Set via environment variables:
FEW_SHOT_CACHE_MAX_SIZE=2000
FEW_SHOT_CACHE_TTL_SECONDS=7200
FEW_SHOT_DEFAULT_MAX_EXAMPLES=5Or via code:
from apps.backend.services.few_shot_examples import config
config.CACHE_MAX_SIZE = 3000Data Management
Seed Database
uv run python scripts/seed.examples.py --category triageAdd Examples
- Edit JSONL file in
database/output-styles/examples/seeds/ - Run seed script
- Verify via API or Prisma
Testing
# Run service tests
pytest apps/backend/services/few_shot_examples/tests/
# Test API
python -c "from apps.backend.services.few_shot_examples import FewShotExamplesAPI; print('β
Service import works')"
### Data README
```markdown
# Few-Shot Examples Data
## Location
`database/output-styles/examples/`
## Structure
- **by-scenario/**: Human-readable reference examples (JSON)
- **edge-cases/**: Edge case handling examples (JSON)
- **seeds/**: Database seed files (JSONL)
- **embeddings/**: Vector embeddings for semantic search (NPY)
## Retrieval Methods
### 1. Direct JSON Reading
Fast file reading for documentation and reference:
```python
import json
from pathlib import Path
examples = json.load(
Path("database/output-styles/examples/by-scenario/triage.json").open()
)2. Database Queries
Fast indexed queries via Prisma:
from apps.backend.services.few_shot_examples import FewShotExamplesAPI
api = FewShotExamplesAPI()
examples = await api.get_examples_for_prompt(
prompt_text="partnership inquiry",
prompt_type="triage"
)3. Semantic Search (Future)
Vector embedding similarity:
from apps.backend.services.few_shot_examples import SemanticMatcher
matcher = SemanticMatcher()
similar = await matcher.find_similar(
"partnership inquiry from premium league",
top_k=5
)Maintenance
- Add Examples: Edit JSONL in
seeds/, run seed script - Update Examples: Edit JSONL, reseed (upserts existing)
- Quality Review: Quarterly review of examples
## π― Benefits of Revised Architecture
### β
Clean SeparationDatabase Layer: database/output-styles/examples/ # Data only Service Layer: apps/backend/services/few_shot_examples/ # Logic only
### β
Proper Service Architecture
- Aligns with `apps/backend/services/` pattern
- Clear service boundaries and responsibilities
- Easy to test, mock, and maintain
### β
Consistent with Backend Structureapps/backend/ βββ agents/ # Agent implementations βββ api/ # API endpoints βββ services/ # Business logic services β βββ few_shot_examples/ # β Our service here β βββ league_processing/ β βββ contract_generation/ βββ routers/ # Route handlers
### β
Import Clarity
```python
# Clear, semantic imports
from apps.backend.services.few_shot_examples import FewShotExamplesAPI
from apps.backend.services.contract_generation import ContractGenerator
from apps.backend.services.league_processing import LeagueProcessorπ Migration Comparison
| Aspect | Before | After |
|---|---|---|
| Service Location | database/few_shot_examples_training_data/ | apps/backend/services/few_shot_examples/ |
| Data Location | database/few_shot_examples_training_data/data/ | database/output-styles/examples/seeds/ |
| Architecture | Mixed data + logic | Clean separation |
| Consistency | Inconsistent with project | Follows service pattern |
| Imports | Long, unclear | Short, semantic |
β Validation Checklist
- Service directory created:
apps/backend/services/few_shot_examples/ - Infrastructure code moved to service
- JSONL files moved to
database/output-styles/examples/seeds/ - Configuration file created (
config.py) - All imports updated
- Seed script updated
- Tests moved and updated
- Documentation created
- Old directory removed
- Service imports work:
from apps.backend.services.few_shot_examples import FewShotExamplesAPI
π Quick Migration Command
#!/bin/bash
# migrate-few-shot-service.sh
set -e
echo "π Migrating Few-Shot Examples to proper service location..."
# 1. Create service structure
mkdir -p apps/backend/services/few_shot_examples/tests
# 2. Move infrastructure
mv database/few_shot_examples_training_data/*.py \
apps/backend/services/few_shot_examples/ 2>/dev/null || true
# 3. Create data structure
mkdir -p database/output-styles/examples/seeds
mkdir -p database/output-styles/examples/embeddings
# 4. Move data
mv database/few_shot_examples_training_data/data/*.jsonl \
database/output-styles/examples/seeds/ 2>/dev/null || true
# 5. Create config
cat > apps/backend/services/few_shot_examples/config.py << 'EOF'
# Config content here...
EOF
# 6. Update imports
echo "β οΈ Manual step: Update imports in your code"
echo " OLD: from database.few_shot_examples_training_data import ..."
echo " NEW: from apps.backend.services.few_shot_examples import ..."
# 7. Clean up
echo "π§Ή After verifying migration, remove old directory:"
echo " rm -rf database/few_shot_examples_training_data/"
echo "β
Migration structure created!"
echo "π Next: Update imports and test"π Summary
Architecture Change:
β database/few_shot_examples_training_data/ (mixed data + logic)
β
apps/backend/services/few_shot_examples/ (service logic)
β
database/output-styles/examples/ (data only)Key Improvements:
- β Proper service architecture
- β Clean data/logic separation
- β Consistent with project patterns
- β Better imports and discoverability
- β Easier to test and maintain