Source: data_layer/docs/START_HERE_ORGANIZATION.md
π― START HERE: Data Fabric Organization
Quick Reference for New and Existing Team Members
π Documentation Index
This directory contains comprehensive organization guides. Read them in this order:
1οΈβ£ ORGANIZATION_DECISION_TREE.md β START HERE
- Purpose: Quick "Where should this file go?" decision tree
- Read if: You need to add/move a file RIGHT NOW
- Time: 5 minutes
2οΈβ£ ORGANIZATION_STRATEGY_COMPLETE.md
- Purpose: Deep dive into WHY we organized this way
- Read if: You want to understand the architecture philosophy
- Time: 20 minutes
- Covers:
- Lifecycle vs Slice vs Scenario organization
- Complete directory structure
- Decision matrix for our system
3οΈβ£ MIGRATION_GUIDE_PRACTICAL.md
- Purpose: Step-by-step migration instructions
- Read if: You're implementing the new structure
- Time: 30 minutes (reading), 3 weeks (implementation)
- Covers:
- Week-by-week migration plan
- Scripts for automated migration
- Testing and validation
- Rollback procedures
4οΈβ£ BEST_PRACTICES_VALIDATION.md
- Purpose: Industry best practices validation
- Read if: You want external validation of this approach
- Time: 15 minutes
- Covers:
- Comparison with industry standards
- Research citations
- Why this is "best practice"
5οΈβ£ NAMING_STRATEGY.md & FINAL_NAMING_DECISION.md
- Purpose: Why we use "data_fabric" instead of "database"
- Read if: You're curious about naming decisions
- Time: 10 minutes
ποΈ TL;DR: The Structure
Current Reality (What You See Today)
data_fabric/
βββ prompts/ # Mixed (templates + code)
βββ storage/ # Python modules (operational)
βββ knowledge/ # Python modules (AI operations)
βββ kb_catalog/ # Business rules & registries
βββ output-styles/ # Mixed (config + pipeline outputs)
βββ config/ # β οΈ Should move
βββ onboarding/ # β
Pipeline stages
βββ schemas/ # β οΈ Duplicates existProblem: Mixed organization makes it hard to know where things belong.
Recommended Future State
data_fabric/
βββ definitions/ # π SOURCE OF TRUTH (git-tracked)
β βββ schemas/ # Data structures
β βββ config/ # Business rules (pricing, scoring)
β βββ templates/ # Prompt & doc templates
β βββ examples/ # Training data
β βββ catalog/ # System metadata
β
βββ weave/ # π§ OPERATIONAL CODE (Python modules)
β βββ knowledge/ # AI operations
β βββ storage/ # Database operations
β βββ prompts/ # Prompt building
β βββ generators/ # Data transformation
β βββ validators/ # Data validation
β
βββ views/ # π GENERATED OUTPUTS (gitignored)
βββ onboarding/ # Pipeline stage results
βββ contracts/ # Generated documents
βββ uploads/ # User filesPhilosophy: Lifecycle-based (source β runtime β output) at top level, domain/scenario within each level.
π Quick Start
If You Need to Add a File RIGHT NOW:
Ask yourself:
-
Is it gitignored?
β YES? Put it inviews/ -
Is it a .py file?
β YES? Put it inweave/ -
Is it hand-written data?
β YES? Put it indefinitions/
Still confused? See ORGANIZATION_DECISION_TREE.md
If You're Planning a Big Change:
- Read ORGANIZATION_STRATEGY_COMPLETE.md
- Use MIGRATION_GUIDE_PRACTICAL.md
- Run tests at every step
- Backup before making changes
π Key Concepts
1. Lifecycle Stages (Top-Level Organization)
| Stage | Directory | Contents | Git-Tracked? |
|---|---|---|---|
| SOURCE | definitions/ | Canonical data, schemas, configs | β YES |
| RUNTIME | weave/ | Python operational code | β YES |
| OUTPUT | views/ | Generated/uploaded files | β NO (.gitignore) |
2. Domain Slicing (Within definitions/)
Organized by business capability:
config/business/pricing/- Pricing rulesconfig/business/scoring/- Scoring logicconfig/sports/- Sport-specific data
3. Scenario Slicing (Within views/)
Organized by workflow/pipeline:
views/onboarding/- Onboarding pipelineviews/analytics/- Analytics workflowsviews/contracts/- Contract generation
4. Technical Slicing (Within weave/)
Organized by system capability:
weave/knowledge/- AI/ML operationsweave/storage/- Database operationsweave/prompts/- Generation logic
π¦ Common Scenarios
Scenario 1: "I have a new JSON config file"
File: new_feature.config.json
Type: Configuration data
Mutable: No (hand-written)
β definitions/config/business/new_feature.config.jsonScenario 2: "I have a new Python service"
File: new_service.py
Type: Operational code
Imports: Other Python modules
β weave/{knowledge|storage|prompts}/new_service.pyScenario 3: "I have a generated contract"
File: contract_xyz.md
Type: Generated output
Mutable: Yes (regenerated)
β views/contracts/contract_xyz.mdScenario 4: "I have a Jinja2 prompt template"
File: suggest_tier.j2
Type: Template (hand-written)
Mutable: No (source)
β definitions/templates/prompts/suggest_tier.j2Scenario 5: "I have training examples in JSONL"
File: tier_examples.jsonl
Type: Training data
Mutable: No (reference data)
β definitions/examples/onboarding/tier_classification/tier_examples.jsonlπ Detailed Guides by Role
For Developers:
- Start with ORGANIZATION_DECISION_TREE.md
- Reference ORGANIZATION_STRATEGY_COMPLETE.md for deep understanding
For DevOps/Migration Team:
- Read MIGRATION_GUIDE_PRACTICAL.md
- Execute migration scripts week-by-week
- Validate with tests at each stage
For Architects:
- Read ORGANIZATION_STRATEGY_COMPLETE.md
- Review BEST_PRACTICES_VALIDATION.md
- Adapt to your specific needs
For Onboarding New Team Members:
- Start with this document (you're here!)
- Read ORGANIZATION_DECISION_TREE.md
- Skim ORGANIZATION_STRATEGY_COMPLETE.md
β FAQ
Q: Why "data_fabric" instead of "database"?
A: See NAMING_STRATEGY.md. Short answer: We integrate multiple storage systems (PostgreSQL + Redis + Vector DB), not just one database.
Q: Why lifecycle organization?
A: See ORGANIZATION_STRATEGY_COMPLETE.md. Short answer: Clear separation of immutable source vs mutable runtime vs ephemeral outputs.
Q: Can I still use the old structure during migration?
A: Yes! The migration is non-breaking. Old and new structures coexist during Week 1-2.
Q: What if I put a file in the wrong place?
A: No problem! Just move it using git mv and update any imports. The decision tree helps prevent this.
Q: Why is storage/examples/ code, not data?
A: weave/storage/examples/ is a Python module (retriever.py, matcher.py) for runtime operations. Training data lives in definitions/examples/.
Q: Where do generated schemas go?
A: definitions/schemas/generated/ because they're checked into git and imported by apps (not ephemeral).
π οΈ Useful Commands
Check Where a File Should Go
# Use the decision tree
cat data_fabric/ORGANIZATION_DECISION_TREE.md | grep -A 5 "your filename pattern"Find All References to a File
# Before moving a file, find all references
grep -r "old/path/to/file" . --include="*.py" --include="*.json"Validate Import Paths
# After migration, test imports
python scripts/test_imports.pyClean Up Generated Files
# Safe to delete views/ (will be regenerated)
rm -rf data_fabric/views/*π¨ Red Flags
DON'T:
- β Put Python code in
definitions/ - β Put configuration in
weave/ - β Git-track files in
views/ - β Put generated contracts in
definitions/ - β Mix training data with operational code
DO:
- β
Keep source of truth in
definitions/ - β
Keep operational code in
weave/ - β
GitIgnore everything in
views/ - β Follow the decision tree when unsure
- β Ask for review on structural changes
π¬ Getting Help
- Check the docs (you're reading them!)
- Use the decision tree (ORGANIZATION_DECISION_TREE.md)
- Ask in team chat with
@data-fabric-architecturetag - Review examples in existing code
- When in doubt, ask before moving!
π― Bottom Line
The Simple Rule:
- Hand-written? β
definitions/ - Executable code? β
weave/ - Generated output? β
views/
Everything else is just details!
Last Updated: 2025-01-16
Maintained by: Data Architecture Team
Questions? See ORGANIZATION_STRATEGY_COMPLETE.md