Architecture
🎯 START HERE: Data Fabric Organization

Source: data_layer/docs/START_HERE_ORGANIZATION.md

🎯 START HERE: Data Fabric Organization

Quick Reference for New and Existing Team Members


πŸ“š Documentation Index

This directory contains comprehensive organization guides. Read them in this order:

1️⃣ ORGANIZATION_DECISION_TREE.md ⭐ START HERE

  • Purpose: Quick "Where should this file go?" decision tree
  • Read if: You need to add/move a file RIGHT NOW
  • Time: 5 minutes

2️⃣ ORGANIZATION_STRATEGY_COMPLETE.md

  • Purpose: Deep dive into WHY we organized this way
  • Read if: You want to understand the architecture philosophy
  • Time: 20 minutes
  • Covers:
    • Lifecycle vs Slice vs Scenario organization
    • Complete directory structure
    • Decision matrix for our system

3️⃣ MIGRATION_GUIDE_PRACTICAL.md

  • Purpose: Step-by-step migration instructions
  • Read if: You're implementing the new structure
  • Time: 30 minutes (reading), 3 weeks (implementation)
  • Covers:
    • Week-by-week migration plan
    • Scripts for automated migration
    • Testing and validation
    • Rollback procedures

4️⃣ BEST_PRACTICES_VALIDATION.md

  • Purpose: Industry best practices validation
  • Read if: You want external validation of this approach
  • Time: 15 minutes
  • Covers:
    • Comparison with industry standards
    • Research citations
    • Why this is "best practice"

5️⃣ NAMING_STRATEGY.md & FINAL_NAMING_DECISION.md

  • Purpose: Why we use "data_fabric" instead of "database"
  • Read if: You're curious about naming decisions
  • Time: 10 minutes

πŸ—‚οΈ TL;DR: The Structure

Current Reality (What You See Today)

data_fabric/
β”œβ”€β”€ prompts/              # Mixed (templates + code)
β”œβ”€β”€ storage/              # Python modules (operational)
β”œβ”€β”€ knowledge/            # Python modules (AI operations)
β”œβ”€β”€ kb_catalog/           # Business rules & registries
└── output-styles/        # Mixed (config + pipeline outputs)
    β”œβ”€β”€ config/           # ⚠️ Should move
    β”œβ”€β”€ onboarding/       # βœ… Pipeline stages
    └── schemas/          # ⚠️ Duplicates exist

Problem: Mixed organization makes it hard to know where things belong.


Recommended Future State

data_fabric/
β”œβ”€β”€ definitions/         # πŸ”’ SOURCE OF TRUTH (git-tracked)
β”‚   β”œβ”€β”€ schemas/         # Data structures
β”‚   β”œβ”€β”€ config/          # Business rules (pricing, scoring)
β”‚   β”œβ”€β”€ templates/       # Prompt & doc templates
β”‚   β”œβ”€β”€ examples/        # Training data
β”‚   └── catalog/         # System metadata
β”‚
β”œβ”€β”€ weave/              # πŸ”§ OPERATIONAL CODE (Python modules)
β”‚   β”œβ”€β”€ knowledge/       # AI operations
β”‚   β”œβ”€β”€ storage/         # Database operations
β”‚   β”œβ”€β”€ prompts/         # Prompt building
β”‚   β”œβ”€β”€ generators/      # Data transformation
β”‚   └── validators/      # Data validation
β”‚
└── views/              # πŸ“Š GENERATED OUTPUTS (gitignored)
    β”œβ”€β”€ onboarding/      # Pipeline stage results
    β”œβ”€β”€ contracts/       # Generated documents
    └── uploads/         # User files

Philosophy: Lifecycle-based (source β†’ runtime β†’ output) at top level, domain/scenario within each level.


πŸš€ Quick Start

If You Need to Add a File RIGHT NOW:

Ask yourself:

  1. Is it gitignored?
    β†’ YES? Put it in views/

  2. Is it a .py file?
    β†’ YES? Put it in weave/

  3. Is it hand-written data?
    β†’ YES? Put it in definitions/

Still confused? See ORGANIZATION_DECISION_TREE.md


If You're Planning a Big Change:

  1. Read ORGANIZATION_STRATEGY_COMPLETE.md
  2. Use MIGRATION_GUIDE_PRACTICAL.md
  3. Run tests at every step
  4. Backup before making changes

πŸŽ“ Key Concepts

1. Lifecycle Stages (Top-Level Organization)

StageDirectoryContentsGit-Tracked?
SOURCEdefinitions/Canonical data, schemas, configsβœ… YES
RUNTIMEweave/Python operational codeβœ… YES
OUTPUTviews/Generated/uploaded files❌ NO (.gitignore)

2. Domain Slicing (Within definitions/)

Organized by business capability:

  • config/business/pricing/ - Pricing rules
  • config/business/scoring/ - Scoring logic
  • config/sports/ - Sport-specific data

3. Scenario Slicing (Within views/)

Organized by workflow/pipeline:

  • views/onboarding/ - Onboarding pipeline
  • views/analytics/ - Analytics workflows
  • views/contracts/ - Contract generation

4. Technical Slicing (Within weave/)

Organized by system capability:

  • weave/knowledge/ - AI/ML operations
  • weave/storage/ - Database operations
  • weave/prompts/ - Generation logic

🚦 Common Scenarios

Scenario 1: "I have a new JSON config file"

File: new_feature.config.json
Type: Configuration data
Mutable: No (hand-written)

β†’ definitions/config/business/new_feature.config.json

Scenario 2: "I have a new Python service"

File: new_service.py
Type: Operational code
Imports: Other Python modules

β†’ weave/{knowledge|storage|prompts}/new_service.py

Scenario 3: "I have a generated contract"

File: contract_xyz.md
Type: Generated output
Mutable: Yes (regenerated)

β†’ views/contracts/contract_xyz.md

Scenario 4: "I have a Jinja2 prompt template"

File: suggest_tier.j2
Type: Template (hand-written)
Mutable: No (source)

β†’ definitions/templates/prompts/suggest_tier.j2

Scenario 5: "I have training examples in JSONL"

File: tier_examples.jsonl
Type: Training data
Mutable: No (reference data)

β†’ definitions/examples/onboarding/tier_classification/tier_examples.jsonl

πŸ“– Detailed Guides by Role

For Developers:

  1. Start with ORGANIZATION_DECISION_TREE.md
  2. Reference ORGANIZATION_STRATEGY_COMPLETE.md for deep understanding

For DevOps/Migration Team:

  1. Read MIGRATION_GUIDE_PRACTICAL.md
  2. Execute migration scripts week-by-week
  3. Validate with tests at each stage

For Architects:

  1. Read ORGANIZATION_STRATEGY_COMPLETE.md
  2. Review BEST_PRACTICES_VALIDATION.md
  3. Adapt to your specific needs

For Onboarding New Team Members:

  1. Start with this document (you're here!)
  2. Read ORGANIZATION_DECISION_TREE.md
  3. Skim ORGANIZATION_STRATEGY_COMPLETE.md

❓ FAQ

Q: Why "data_fabric" instead of "database"?

A: See NAMING_STRATEGY.md. Short answer: We integrate multiple storage systems (PostgreSQL + Redis + Vector DB), not just one database.

Q: Why lifecycle organization?

A: See ORGANIZATION_STRATEGY_COMPLETE.md. Short answer: Clear separation of immutable source vs mutable runtime vs ephemeral outputs.

Q: Can I still use the old structure during migration?

A: Yes! The migration is non-breaking. Old and new structures coexist during Week 1-2.

Q: What if I put a file in the wrong place?

A: No problem! Just move it using git mv and update any imports. The decision tree helps prevent this.

Q: Why is storage/examples/ code, not data?

A: weave/storage/examples/ is a Python module (retriever.py, matcher.py) for runtime operations. Training data lives in definitions/examples/.

Q: Where do generated schemas go?

A: definitions/schemas/generated/ because they're checked into git and imported by apps (not ephemeral).


πŸ› οΈ Useful Commands

Check Where a File Should Go

# Use the decision tree
cat data_fabric/ORGANIZATION_DECISION_TREE.md | grep -A 5 "your filename pattern"

Find All References to a File

# Before moving a file, find all references
grep -r "old/path/to/file" . --include="*.py" --include="*.json"

Validate Import Paths

# After migration, test imports
python scripts/test_imports.py

Clean Up Generated Files

# Safe to delete views/ (will be regenerated)
rm -rf data_fabric/views/*

🚨 Red Flags

DON'T:

  • ❌ Put Python code in definitions/
  • ❌ Put configuration in weave/
  • ❌ Git-track files in views/
  • ❌ Put generated contracts in definitions/
  • ❌ Mix training data with operational code

DO:

  • βœ… Keep source of truth in definitions/
  • βœ… Keep operational code in weave/
  • βœ… GitIgnore everything in views/
  • βœ… Follow the decision tree when unsure
  • βœ… Ask for review on structural changes

πŸ“¬ Getting Help

  1. Check the docs (you're reading them!)
  2. Use the decision tree (ORGANIZATION_DECISION_TREE.md)
  3. Ask in team chat with @data-fabric-architecture tag
  4. Review examples in existing code
  5. When in doubt, ask before moving!

🎯 Bottom Line

The Simple Rule:

  • Hand-written? β†’ definitions/
  • Executable code? β†’ weave/
  • Generated output? β†’ views/

Everything else is just details!


Last Updated: 2025-01-16
Maintained by: Data Architecture Team
Questions? See ORGANIZATION_STRATEGY_COMPLETE.md

Platform

Documentation

Community

Support

partnership@altsportsdata.comdev@altsportsleagues.ai

2025 Β© AltSportsLeagues.ai. Powered by AI-driven sports business intelligence.

πŸ€– AI-Enhancedβ€’πŸ“Š Data-Drivenβ€’βš‘ Real-Time