Source: data_layer/docs/MIGRATION_GUIDE_PRACTICAL.md
π Practical Migration Guide
Goal: Reorganize data_fabric/ from mixed organization to hybrid lifecycle structure
Timeline: 3 weeks (non-breaking, incremental)
π Pre-Migration Checklist
1. Backup Current State
# Create timestamped backup
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
tar -czf "data_fabric_backup_${TIMESTAMP}.tar.gz" data_fabric/
echo "Backup created: data_fabric_backup_${TIMESTAMP}.tar.gz"2. Document Current Import Paths
# Find all Python imports referencing data_fabric
grep -r "from data_fabric" . --include="*.py" > migration_imports_before.txt
grep -r "import data_fabric" . --include="*.py" >> migration_imports_before.txt
# Find all file references in configs
grep -r "data_fabric/" . --include="*.json" --include="*.yaml" > migration_paths_before.txt3. Create .gitignore for views/
# Add to data_fabric/.gitignore
cat >> data_fabric/.gitignore << 'EOF'
# Generated outputs (views/)
views/*
!views/README.md
!views/**/README.md
# Runtime caches
weave/storage/examples/data/
weave/knowledge/storage/cache/
EOFποΈ Week 1: Build New Structure (Non-Breaking)
Phase 1A: Create Directory Structure
# Execute this script to create all directories at once
cat > scripts/create_new_structure.sh << 'EOF'
#!/bin/bash
set -e
echo "Creating new directory structure..."
# Level 1: definitions/
mkdir -p data_fabric/definitions/{schemas,config,templates,examples,catalog}
# schemas/ (already exists, just ensure structure)
mkdir -p data_fabric/definitions/schemas/{domain,generated,seeds}
mkdir -p data_fabric/definitions/schemas/generated/{drizzle,pydantic,typescript}
# config/
mkdir -p data_fabric/definitions/config/{business,sports,pipeline}
mkdir -p data_fabric/definitions/config/business/{pricing,scoring,contracts}
# templates/
mkdir -p data_fabric/definitions/templates/{prompts,contracts}
mkdir -p data_fabric/definitions/templates/prompts/{onboarding,contracts,classification,components}
# examples/
mkdir -p data_fabric/definitions/examples/{onboarding,sports_classification}
mkdir -p data_fabric/definitions/examples/onboarding/{questionnaire_extraction,tier_classification,contract_assembly}
# catalog/ (if kb_catalog should be here)
mkdir -p data_fabric/definitions/catalog/{constants,registry,manifests}
# Level 2: weave/
mkdir -p data_fabric/weave/{knowledge,storage,prompts,generators,validators}
# knowledge/ (already exists, ensure subdirs)
mkdir -p data_fabric/weave/knowledge/{embeddings,intent,retrieval,storage,templates}
# storage/ (already exists, ensure subdirs)
mkdir -p data_fabric/weave/storage/{examples,postgres,redis,supabase}
# prompts/
mkdir -p data_fabric/weave/prompts/{builders,registry}
# generators/ (new)
mkdir -p data_fabric/weave/generators
# validators/ (new)
mkdir -p data_fabric/weave/validators
# Level 3: views/
mkdir -p data_fabric/views/{onboarding,analytics,contracts,uploads}
mkdir -p data_fabric/views/onboarding/{02-ingest-validate-questionnaire,03-enhance-documents,04-classify-and-score,05-upsert-and-crossref,06-suggest-tiers-and-terms,07-assemble-contract,07a-output-contract-export,07b-output-gamekeeper-scorekeeper-ui,07c-output-marketing-nxt-onboarding-materials}
echo "β Directory structure created successfully"
EOF
chmod +x scripts/create_new_structure.sh
./scripts/create_new_structure.shPhase 1B: Create README Files
# Create READMEs to explain each directory
cat > scripts/create_readmes.sh << 'EOF'
#!/bin/bash
# definitions/ README
cat > data_fabric/definitions/README.md << 'INNER_EOF'
# definitions/ - Source of Truth
This directory contains all **canonical, version-controlled definitions**.
## Structure
- `schemas/` - Data structure definitions (JSON Schema, SQL DDL)
- `config/` - Business rules and configuration files
- `templates/` - Jinja2/Mustache templates for prompts and documents
- `examples/` - Training data and reference examples (JSONL)
- `catalog/` - System metadata and inventories
## Principles
- β
All files are **git-tracked**
- β
Files are **immutable** (don't change at runtime)
- β
These are **sources**, not generated outputs
- β
Changes require code review and versioning
## What Goes Here?
- Hand-written schemas
- Business configuration (pricing, scoring rules)
- Prompt templates
- Training examples for ML/LLM
- System constants and enums
INNER_EOF
# weave/ README
cat > data_fabric/weave/README.md << 'INNER_EOF'
# weave/ - Operational Runtime
This directory contains all **operational Python code** that runs the system.
## Structure
- `knowledge/` - AI/ML operations (embeddings, RAG, intent classification)
- `storage/` - Database operations (PostgreSQL, Redis, Supabase)
- `prompts/` - Dynamic prompt assembly and building
- `generators/` - Data transformation pipelines
- `validators/` - Data validation logic
## Principles
- β
All files are **Python modules** (.py)
- β
Code is **imported and executed**
- β
These are **operational services**, not data
- β
Well-tested with unit tests
## What Goes Here?
- Python modules that perform operations
- Services that interact with databases/APIs
- Code that transforms data
- Logic that validates data
INNER_EOF
# views/ README
cat > data_fabric/views/README.md << 'INNER_EOF'
# views/ - Generated Outputs
This directory contains all **generated, materialized outputs**.
## Structure
- `onboarding/` - Onboarding pipeline stage outputs
- `analytics/` - Analytics and reporting outputs
- `contracts/` - Generated contracts and documents
- `uploads/` - User-uploaded files
## Principles
- β οΈ **ALL files are .gitignored**
- β οΈ Files are **ephemeral** (can be deleted/regenerated)
- β οΈ These are **outputs**, not sources
- β οΈ No code reviews needed (auto-generated)
## What Goes Here?
- Pipeline stage artifacts
- Generated contracts/documents
- User uploads
- Cached/materialized data
- Temporary processing files
## Cleanup
```bash
# Safe to delete everything (will be regenerated)
rm -rf data_fabric/views/*INNER_EOF
echo "β README files created" EOF
chmod +x scripts/create_readmes.sh ./scripts/create_readmes.sh
---
## π¦ Week 1: Copy Files (Non-Destructive)
### Phase 2A: Copy Config Files
```bash
# Copy (don't move) config files
cat > scripts/copy_configs.sh << 'EOF'
#!/bin/bash
set -e
echo "Copying config files..."
# Business config
if [ -d "data_fabric/output-styles/config/business/pricing" ]; then
cp -r data_fabric/output-styles/config/business/pricing/* \
data_fabric/definitions/config/business/pricing/
echo "β Copied pricing configs"
fi
if [ -d "data_fabric/output-styles/config/business/scoring" ]; then
cp -r data_fabric/output-styles/config/business/scoring/* \
data_fabric/definitions/config/business/scoring/
echo "β Copied scoring configs"
fi
# Verify copies
echo ""
echo "Verification:"
ls -la data_fabric/definitions/config/business/pricing/
ls -la data_fabric/definitions/config/business/scoring/
echo ""
echo "β Config files copied (originals preserved)"
EOF
chmod +x scripts/copy_configs.sh
./scripts/copy_configs.shPhase 2B: Copy Prompt Components
# Copy prompt components to templates
cat > scripts/copy_prompts.sh << 'EOF'
#!/bin/bash
set -e
echo "Copying prompt components..."
if [ -d "data_fabric/prompts/components" ]; then
cp -r data_fabric/prompts/components/* \
data_fabric/definitions/templates/prompts/components/
echo "β Copied prompt components"
fi
# Verify
echo ""
echo "Verification:"
ls -la data_fabric/definitions/templates/prompts/components/
echo ""
echo "β Prompt components copied (originals preserved)"
EOF
chmod +x scripts/copy_prompts.sh
./scripts/copy_prompts.shPhase 2C: Move Code Files (Builders)
# Move (not copy) code files to weave/
cat > scripts/move_builders.sh << 'EOF'
#!/bin/bash
set -e
echo "Moving prompt builders to weave..."
if [ -d "data_fabric/prompts/builders" ]; then
# Builders are code, should be in weave/
mv data_fabric/prompts/builders/* \
data_fabric/weave/prompts/builders/
echo "β Moved prompt builders"
fi
# Verify
echo ""
echo "Verification:"
ls -la data_fabric/weave/prompts/builders/
echo ""
echo "β Builders moved to weave/"
EOF
chmod +x scripts/move_builders.sh
./scripts/move_builders.shπ Week 2: Update Import Paths
Phase 3A: Create Import Mapping
# scripts/update_imports.py
"""
Automated import path updater for migration
"""
import re
from pathlib import Path
from typing import Dict
# Define import mappings
IMPORT_MAPPINGS: Dict[str, str] = {
# Config imports
r'from data_fabric\.output_styles\.config\.business\.pricing':
'from data_fabric.definitions.config.business.pricing',
r'from data_fabric\.output_styles\.config\.business\.scoring':
'from data_fabric.definitions.config.business.scoring',
# Prompt imports
r'from data_fabric\.prompts\.components':
'from data_fabric.definitions.templates.prompts.components',
r'from data_fabric\.prompts\.builders':
'from data_fabric.weave.prompts.builders',
# Knowledge imports (if needed)
r'from data_fabric\.knowledge':
'from data_fabric.weave.knowledge',
# Storage imports (if needed)
r'from data_fabric\.storage':
'from data_fabric.weave.storage',
}
def update_imports_in_file(file_path: Path) -> bool:
"""Update imports in a single Python file"""
try:
content = file_path.read_text()
original_content = content
# Apply all mappings
for old_pattern, new_import in IMPORT_MAPPINGS.items():
content = re.sub(old_pattern, new_import, content)
# Only write if changes were made
if content != original_content:
file_path.write_text(content)
print(f"β Updated: {file_path}")
return True
return False
except Exception as e:
print(f"β Error updating {file_path}: {e}")
return False
def main():
"""Update all Python files"""
root = Path(".")
# Find all Python files
py_files = list(root.rglob("*.py"))
# Exclude certain directories
excluded = {"node_modules", ".git", "__pycache__", "venv", ".venv"}
py_files = [
f for f in py_files
if not any(ex in f.parts for ex in excluded)
]
print(f"Found {len(py_files)} Python files")
print("Updating imports...")
print()
updated_count = 0
for py_file in py_files:
if update_imports_in_file(py_file):
updated_count += 1
print()
print(f"β Updated {updated_count} files")
print(f"β Skipped {len(py_files) - updated_count} files (no changes needed)")
if __name__ == "__main__":
main()Phase 3B: Run Import Updates
# Run the import updater
python scripts/update_imports.py
# Review changes
git diff --stat
# If satisfied, commit
git add .
git commit -m "refactor: Update imports for new data_fabric structure"β Week 2: Test & Validate
Phase 4A: Test Imports
# scripts/test_imports.py
"""
Validate that all imports still work
"""
import sys
import importlib
from pathlib import Path
def test_config_imports():
"""Test config imports"""
try:
from data_fabric.definitions.config.business.pricing import tier_presets
print("β Config imports work")
return True
except ImportError as e:
print(f"β Config import failed: {e}")
return False
def test_prompt_imports():
"""Test prompt imports"""
try:
from data_fabric.weave.prompts.builders import onboarding_prompts
print("β Prompt builder imports work")
return True
except ImportError as e:
print(f"β Prompt import failed: {e}")
return False
def test_knowledge_imports():
"""Test knowledge imports"""
try:
from data_fabric.weave.knowledge.retrieval import rag_service
print("β Knowledge imports work")
return True
except ImportError as e:
print(f"β Knowledge import failed: {e}")
return False
def main():
print("Testing imports after migration...")
print()
results = [
test_config_imports(),
test_prompt_imports(),
test_knowledge_imports(),
]
print()
if all(results):
print("β All imports working!")
return 0
else:
print("β Some imports failed")
return 1
if __name__ == "__main__":
sys.exit(main())# Run import tests
python scripts/test_imports.pyPhase 4B: Run Existing Tests
# Run all unit tests
python -m pytest tests/ -v
# Run specific integration tests
python -m pytest tests/integration/ -v
# Check for any import errors
python -m pytest --co # Collect tests (will fail if imports are broken)ποΈ Week 3: Clean Up Old Structure
Phase 5A: Create Cleanup Script (DRY RUN FIRST!)
# scripts/cleanup_old_structure.sh
#!/bin/bash
set -e
DRY_RUN=${1:-"--dry-run"}
echo "Cleanup script starting..."
echo "Mode: ${DRY_RUN}"
echo ""
if [ "$DRY_RUN" == "--dry-run" ]; then
echo "π DRY RUN MODE (no files will be deleted)"
echo ""
fi
cleanup_file() {
local file=$1
local reason=$2
if [ "$DRY_RUN" == "--dry-run" ]; then
echo "Would delete: ${file} (${reason})"
else
if [ -e "$file" ]; then
git rm -r "$file"
echo "β Deleted: ${file}"
fi
fi
}
# Remove config from old location (now in definitions/config/)
cleanup_file "data_fabric/output-styles/config/" "moved to definitions/config/"
# Remove prompt components (now in definitions/templates/)
cleanup_file "data_fabric/prompts/components/" "moved to definitions/templates/prompts/"
# Note: Keep prompts/builders/ empty since files moved to weave/
if [ "$DRY_RUN" == "--dry-run" ]; then
echo ""
echo "β Dry run complete. Review changes above."
echo ""
echo "To actually delete files, run:"
echo " ./scripts/cleanup_old_structure.sh --execute"
else
echo ""
echo "β Cleanup complete"
echo ""
echo "Don't forget to commit:"
echo " git commit -m 'refactor: Remove old data_fabric structure after migration'"
fiPhase 5B: Run Cleanup (Carefully!)
# First, DRY RUN to see what would be deleted
chmod +x scripts/cleanup_old_structure.sh
./scripts/cleanup_old_structure.sh --dry-run
# Review the output carefully!
# If everything looks good, execute
./scripts/cleanup_old_structure.sh --execute
# Commit the cleanup
git add .
git commit -m "refactor: Remove old data_fabric structure after migration"π― Post-Migration Checklist
1. Verify Structure
# Check new structure exists
ls -la data_fabric/definitions/
ls -la data_fabric/weave/
ls -la data_fabric/views/
# Check files are in correct locations
ls -la data_fabric/definitions/config/business/pricing/
ls -la data_fabric/weave/prompts/builders/2. Run Full Test Suite
# All tests should still pass
python -m pytest tests/ -v --tb=short
# Integration tests
python -m pytest tests/integration/ -v
# E2E tests
python -m pytest tests/e2e/ -v3. Update Documentation
# Update main README
# Update architecture docs
# Update developer onboarding docs4. Team Communication
## Migration Complete! π
The `data_fabric/` directory has been reorganized for better clarity:
**New Structure:**
- `definitions/` - Source of truth (git-tracked)
- `weave/` - Operational code (Python modules)
- `views/` - Generated outputs (gitignored)
**What Changed:**
- Config files moved: `output-styles/config/` β `definitions/config/`
- Prompt templates moved: `prompts/components/` β `definitions/templates/prompts/`
- Builders moved: `prompts/builders/` β `weave/prompts/builders/`
**Action Items:**
- Pull latest changes: `git pull`
- No code changes needed (imports auto-updated)
- Read new READMEs in each directory
**Questions?** See `data_fabric/ORGANIZATION_STRATEGY_COMPLETE.md`π¨ Rollback Plan (If Needed)
If something goes wrong:
# 1. Restore from backup
tar -xzf data_fabric_backup_YYYYMMDD_HHMMSS.tar.gz
# 2. Reset git changes
git reset --hard HEAD~1 # Go back one commit
# or
git reset --hard <commit-hash> # Go back to specific commit
# 3. Verify restoration
python -m pytest tests/ -v
# 4. Document what went wrong
# ... and plan better next timeπ Success Metrics
Migration is complete when:
- β All files moved to correct locations
- β No broken imports
- β All tests passing
- β Old structure removed
- β Documentation updated
- β Team informed
- β CI/CD pipeline green
Bottom Line: Follow this guide step-by-step, TEST EVERYTHING, and you'll have a clean, well-organized data_fabric/ structure in 3 weeks.