Source: data_layer/docs/START_HERE_ORGANIZATION.md

🎯 START HERE: Data Fabric Organization

Quick Reference for New and Existing Team Members

📚 Documentation Index

This directory contains comprehensive organization guides. Read them in this order:

1️⃣ ORGANIZATION_DECISION_TREE.md ⭐ START HERE

Purpose: Quick "Where should this file go?" decision tree
Read if: You need to add/move a file RIGHT NOW
Time: 5 minutes

2️⃣ ORGANIZATION_STRATEGY_COMPLETE.md

Purpose: Deep dive into WHY we organized this way
Read if: You want to understand the architecture philosophy
Time: 20 minutes
Covers:
- Lifecycle vs Slice vs Scenario organization
- Complete directory structure
- Decision matrix for our system

3️⃣ MIGRATION_GUIDE_PRACTICAL.md

Purpose: Step-by-step migration instructions
Read if: You're implementing the new structure
Time: 30 minutes (reading), 3 weeks (implementation)
Covers:
- Week-by-week migration plan
- Scripts for automated migration
- Testing and validation
- Rollback procedures

4️⃣ BEST_PRACTICES_VALIDATION.md

Purpose: Industry best practices validation
Read if: You want external validation of this approach
Time: 15 minutes
Covers:
- Comparison with industry standards
- Research citations
- Why this is "best practice"

5️⃣ NAMING_STRATEGY.md & FINAL_NAMING_DECISION.md

Purpose: Why we use "data_fabric" instead of "database"
Read if: You're curious about naming decisions
Time: 10 minutes

🗂️ TL;DR: The Structure

Current Reality (What You See Today)

data_fabric/
├── prompts/              # Mixed (templates + code)
├── storage/              # Python modules (operational)
├── knowledge/            # Python modules (AI operations)
├── kb_catalog/           # Business rules & registries
└── output-styles/        # Mixed (config + pipeline outputs)
    ├── config/           # ⚠️ Should move
    ├── onboarding/       # ✅ Pipeline stages
    └── schemas/          # ⚠️ Duplicates exist

Problem: Mixed organization makes it hard to know where things belong.

Recommended Future State

data_fabric/
├── definitions/         # 🔒 SOURCE OF TRUTH (git-tracked)
│   ├── schemas/         # Data structures
│   ├── config/          # Business rules (pricing, scoring)
│   ├── templates/       # Prompt & doc templates
│   ├── examples/        # Training data
│   └── catalog/         # System metadata
│
├── weave/              # 🔧 OPERATIONAL CODE (Python modules)
│   ├── knowledge/       # AI operations
│   ├── storage/         # Database operations
│   ├── prompts/         # Prompt building
│   ├── generators/      # Data transformation
│   └── validators/      # Data validation
│
└── views/              # 📊 GENERATED OUTPUTS (gitignored)
    ├── onboarding/      # Pipeline stage results
    ├── contracts/       # Generated documents
    └── uploads/         # User files

Philosophy: Lifecycle-based (source → runtime → output) at top level, domain/scenario within each level.

🚀 Quick Start

If You Need to Add a File RIGHT NOW:

Ask yourself:

Is it gitignored?
→ YES? Put it in views/
Is it a .py file?
→ YES? Put it in weave/
Is it hand-written data?
→ YES? Put it in definitions/

Still confused? See ORGANIZATION_DECISION_TREE.md

If You're Planning a Big Change:

Read ORGANIZATION_STRATEGY_COMPLETE.md
Use MIGRATION_GUIDE_PRACTICAL.md
Run tests at every step
Backup before making changes

🎓 Key Concepts

1. Lifecycle Stages (Top-Level Organization)

Stage	Directory	Contents	Git-Tracked?
SOURCE	`definitions/`	Canonical data, schemas, configs	✅ YES
RUNTIME	`weave/`	Python operational code	✅ YES
OUTPUT	`views/`	Generated/uploaded files	❌ NO (.gitignore)

2. Domain Slicing (Within definitions/)

Organized by business capability:

config/business/pricing/ - Pricing rules
config/business/scoring/ - Scoring logic
config/sports/ - Sport-specific data

3. Scenario Slicing (Within views/)

Organized by workflow/pipeline:

views/onboarding/ - Onboarding pipeline
views/analytics/ - Analytics workflows
views/contracts/ - Contract generation

4. Technical Slicing (Within weave/)

Organized by system capability:

weave/knowledge/ - AI/ML operations
weave/storage/ - Database operations
weave/prompts/ - Generation logic

🚦 Common Scenarios

Scenario 1: "I have a new JSON config file"

File: new_feature.config.json
Type: Configuration data
Mutable: No (hand-written)

→ definitions/config/business/new_feature.config.json

Scenario 2: "I have a new Python service"

File: new_service.py
Type: Operational code
Imports: Other Python modules

→ weave/{knowledge|storage|prompts}/new_service.py

Scenario 3: "I have a generated contract"

File: contract_xyz.md
Type: Generated output
Mutable: Yes (regenerated)

→ views/contracts/contract_xyz.md

Scenario 4: "I have a Jinja2 prompt template"

File: suggest_tier.j2
Type: Template (hand-written)
Mutable: No (source)

→ definitions/templates/prompts/suggest_tier.j2

Scenario 5: "I have training examples in JSONL"

File: tier_examples.jsonl
Type: Training data
Mutable: No (reference data)

→ definitions/examples/onboarding/tier_classification/tier_examples.jsonl

📖 Detailed Guides by Role

For Developers:

Start with ORGANIZATION_DECISION_TREE.md
Reference ORGANIZATION_STRATEGY_COMPLETE.md for deep understanding

For DevOps/Migration Team:

Read MIGRATION_GUIDE_PRACTICAL.md
Execute migration scripts week-by-week
Validate with tests at each stage

For Architects:

Read ORGANIZATION_STRATEGY_COMPLETE.md
Review BEST_PRACTICES_VALIDATION.md
Adapt to your specific needs

For Onboarding New Team Members:

Start with this document (you're here!)
Read ORGANIZATION_DECISION_TREE.md
Skim ORGANIZATION_STRATEGY_COMPLETE.md

❓ FAQ

Q: Why "data_fabric" instead of "database"?

A: See NAMING_STRATEGY.md. Short answer: We integrate multiple storage systems (PostgreSQL + Redis + Vector DB), not just one database.

Q: Why lifecycle organization?

A: See ORGANIZATION_STRATEGY_COMPLETE.md. Short answer: Clear separation of immutable source vs mutable runtime vs ephemeral outputs.

Q: Can I still use the old structure during migration?

A: Yes! The migration is non-breaking. Old and new structures coexist during Week 1-2.

Q: What if I put a file in the wrong place?

A: No problem! Just move it using git mv and update any imports. The decision tree helps prevent this.

Q: Why is `storage/examples/` code, not data?

A: weave/storage/examples/ is a Python module (retriever.py, matcher.py) for runtime operations. Training data lives in definitions/examples/.

Q: Where do generated schemas go?

A: definitions/schemas/generated/ because they're checked into git and imported by apps (not ephemeral).

🛠️ Useful Commands

Check Where a File Should Go

# Use the decision tree
cat data_fabric/ORGANIZATION_DECISION_TREE.md | grep -A 5 "your filename pattern"

Find All References to a File

# Before moving a file, find all references
grep -r "old/path/to/file" . --include="*.py" --include="*.json"

Validate Import Paths

# After migration, test imports
python scripts/test_imports.py

Clean Up Generated Files

# Safe to delete views/ (will be regenerated)
rm -rf data_fabric/views/*

🚨 Red Flags

DON'T:

❌ Put Python code in definitions/
❌ Put configuration in weave/
❌ Git-track files in views/
❌ Put generated contracts in definitions/
❌ Mix training data with operational code

DO:

✅ Keep source of truth in definitions/
✅ Keep operational code in weave/
✅ GitIgnore everything in views/
✅ Follow the decision tree when unsure
✅ Ask for review on structural changes

📬 Getting Help

Check the docs (you're reading them!)
Use the decision tree (ORGANIZATION_DECISION_TREE.md)
Ask in team chat with @data-fabric-architecture tag
Review examples in existing code
When in doubt, ask before moving!

🎯 Bottom Line

The Simple Rule:

Hand-written? → definitions/
Executable code? → weave/
Generated output? → views/

Everything else is just details!

Last Updated: 2025-01-16
Maintained by: Data Architecture Team
Questions? See ORGANIZATION_STRATEGY_COMPLETE.md

🎯 START HERE - Data Layer Navigation 📦 Data Layer Architecture - Delivery Summary