Architecture
🎯 Complete Organization Strategy Analysis

Source: data_layer/docs/ORGANIZATION_STRATEGY_COMPLETE.md

🎯 Complete Organization Strategy Analysis

Date: 2025-01-16
Purpose: Determine optimal folder organization: lifecycle vs slice vs scenario vs hybrid


πŸ“‹ Current State Assessment

What You Have Now

data_fabric/
β”œβ”€β”€ prompts/              # Lifecycle-ish (generation logic)
β”œβ”€β”€ storage/              # Lifecycle (runtime operations)
β”œβ”€β”€ knowledge/            # Lifecycle (intelligence operations)
β”œβ”€β”€ kb_catalog/           # Mixed (business rules + config)
└── output-styles/        # Scenario-based (onboarding pipeline)
    β”œβ”€β”€ config/           # ⚠️ Doesn't fit "output-styles"
    β”œβ”€β”€ onboarding/       # βœ… Scenario-based stages
    └── schemas/          # ⚠️ Duplicates exist elsewhere

Current Organization: MIXED (70% lifecycle, 30% scenario)


🎨 Organization Philosophies Explained

1️⃣ Lifecycle Stage Organization

Definition: Organize by WHERE data exists in its transformation journey

data_fabric/
β”œβ”€β”€ definitions/          # BIRTH: Canonical sources
β”œβ”€β”€ weave/               # LIFE: Active processing
└── views/               # DEATH: Materialized outputs

Mental Model: Assembly Line

  • Raw materials β†’ Processing β†’ Finished goods

Pros:

  • βœ… Clear data flow (source β†’ runtime β†’ output)
  • βœ… Separation of concerns (immutable vs mutable)
  • βœ… Git-friendly (know what to track vs ignore)
  • βœ… Scalable (easy to add new lifecycle stages)
  • βœ… DRY (single source of truth enforced)

Cons:

  • ⚠️ Cross-cutting features span multiple stages
  • ⚠️ Harder to navigate for feature-focused work
  • ⚠️ Requires understanding of data lineage

Best For:

  • Data engineering teams
  • Systems with clear ETL pipelines
  • Multi-storage architectures
  • Version-controlled configuration

2️⃣ Slice/Domain Organization

Definition: Organize by WHAT business capability/domain it serves

data_fabric/
β”œβ”€β”€ pricing/             # Everything pricing-related
β”‚   β”œβ”€β”€ schemas/
β”‚   β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ examples/
β”‚   └── runtime/
β”œβ”€β”€ scoring/
β”œβ”€β”€ contracts/
└── questionnaires/

Mental Model: Vertical Slices

  • Each slice is self-contained

Pros:

  • βœ… Feature co-location (everything for X in one place)
  • βœ… Team ownership (clear boundaries)
  • βœ… Easier feature work (don't jump directories)
  • βœ… Microservice-ready (can extract slices)

Cons:

  • ⚠️ Cross-domain code duplication risk
  • ⚠️ Shared infrastructure unclear placement
  • ⚠️ Inconsistent structure across slices
  • ⚠️ Harder to see system-wide patterns

Best For:

  • Product-focused teams
  • Domain-driven design
  • Microservices architecture
  • Teams with ownership boundaries

3️⃣ Scenario/Workflow Organization

Definition: Organize by WHICH business process/workflow it supports

data_fabric/
β”œβ”€β”€ onboarding/          # Everything for onboarding
β”‚   β”œβ”€β”€ 01-ingest/
β”‚   β”œβ”€β”€ 02-classify/
β”‚   └── 03-contract/
β”œβ”€β”€ analytics/
β”œβ”€β”€ real_time_betting/
└── reporting/

Mental Model: User Journeys

  • Follow the business process

Pros:

  • βœ… Business alignment (mirrors operations)
  • βœ… Easy for stakeholders to understand
  • βœ… Clear entry points for workflows
  • βœ… Process optimization visibility

Cons:

  • ⚠️ Massive duplication across scenarios
  • ⚠️ Shared code unclear placement
  • ⚠️ Rigid (hard to support new scenarios)
  • ⚠️ Doesn't reflect code reuse

Best For:

  • Business-driven projects
  • Single workflow focus
  • Prototypes/MVPs
  • Process documentation

4️⃣ Hybrid Organization ⭐ RECOMMENDED

Definition: Lifecycle at top, domain/scenario within stages

data_fabric/
β”œβ”€β”€ definitions/         # LIFECYCLE (immutable, git-tracked)
β”‚   β”œβ”€β”€ schemas/         # Sliced by domain
β”‚   β”œβ”€β”€ config/          # Sliced by domain
β”‚   β”œβ”€β”€ templates/       # Sliced by scenario
β”‚   └── examples/        # Sliced by scenario
β”‚
β”œβ”€β”€ weave/              # LIFECYCLE (runtime, operational)
β”‚   β”œβ”€β”€ knowledge/       # Slice (intelligence)
β”‚   β”œβ”€β”€ storage/         # Slice (persistence)
β”‚   └── prompts/         # Slice (generation)
β”‚
└── views/              # LIFECYCLE (outputs, gitignored)
    β”œβ”€β”€ onboarding/      # Scenario-based
    β”œβ”€β”€ contracts/       # Scenario-based
    └── analytics/       # Scenario-based

Mental Model: Layered Cake with Flavors

  • Layers = lifecycle stages
  • Flavors = domains/scenarios within

Pros:

  • βœ… Best of both worlds (clear flow + feature co-location)
  • βœ… Flexible (choose organization per layer)
  • βœ… Intuitive (lifecycle for infra, domain for business)
  • βœ… Scalable (add slices without restructuring)

Cons:

  • ⚠️ More complex (two organization principles)
  • ⚠️ Requires discipline (don't mix metaphors)

Best For:

  • Complex systems with multiple concerns
  • Mixed technical/business focus
  • Growing teams
  • YOUR SYSTEM βœ…

🎯 Decision Matrix for YOUR System

Your System Characteristics

CharacteristicRealityOrg Implication
Multi-storagePostgreSQL + Redis + Vector→ Lifecycle (separate runtime)
Multiple workflowsOnboarding, analytics, contracts→ Scenario (within outputs)
Business domainsPricing, scoring, sports→ Slice (within config)
Auto-generationConfig → Examples, Schema → Adapters→ Lifecycle (source vs derived)
Team sizeSmall/Growing→ Hybrid (room to evolve)
Git managementVersion control critical→ Lifecycle (immutable vs gitignore)

Conclusion: Hybrid Organization (Lifecycle + Domain/Scenario)


πŸ—οΈ Recommended Structure (Complete)

Top-Level: Lifecycle Stages

data_fabric/
β”œβ”€β”€ definitions/         # πŸ”’ Git-tracked, immutable, canonical
β”œβ”€β”€ weave/              # πŸ”§ Python modules, operational code
β”œβ”€β”€ views/              # πŸ“Š Generated outputs, gitignored
β”œβ”€β”€ docs/               # πŸ“š Documentation (lifecycle-agnostic)
β”œβ”€β”€ scripts/            # πŸ› οΈ Maintenance utilities
└── tests/              # βœ… Testing (lifecycle-agnostic)

Level 1: definitions/ - SOURCE OF TRUTH

Organization: DOMAIN-SLICED (by business capability)

data_fabric/
└── definitions/                          # All canonical data
    β”‚
    β”œβ”€β”€ schemas/                          # βœ… Keep current structure
    β”‚   β”œβ”€β”€ domain/v1/                    # Domain models
    β”‚   β”‚   β”œβ”€β”€ league/
    β”‚   β”‚   β”œβ”€β”€ sports/
    β”‚   β”‚   β”œβ”€β”€ contract/
    β”‚   β”‚   β”œβ”€β”€ pricing/
    β”‚   β”‚   └── questionnaire/
    β”‚   β”‚
    β”‚   β”œβ”€β”€ generated/                    # Auto-generated adapters
    β”‚   β”‚   β”œβ”€β”€ drizzle/                  # TypeScript/Drizzle
    β”‚   β”‚   β”œβ”€β”€ pydantic/                 # Python/Pydantic
    β”‚   β”‚   └── typescript/               # TypeScript interfaces
    β”‚   β”‚
    β”‚   └── README.md
    β”‚
    β”œβ”€β”€ config/                           # Domain-specific business rules
    β”‚   β”œβ”€β”€ business/
    β”‚   β”‚   β”œβ”€β”€ pricing/                  # ← MOVE FROM output-styles/config
    β”‚   β”‚   β”‚   β”œβ”€β”€ tier_presets.v1.json
    β”‚   β”‚   β”‚   β”œβ”€β”€ combat.pricing.v1.json
    β”‚   β”‚   β”‚   β”œβ”€β”€ default.pricing.v1.json
    β”‚   β”‚   β”‚   └── README.md
    β”‚   β”‚   β”‚
    β”‚   β”‚   β”œβ”€β”€ scoring/                  # ← MOVE FROM output-styles/config
    β”‚   β”‚   β”‚   β”œβ”€β”€ scoring_model.v1.json
    β”‚   β”‚   β”‚   β”œβ”€β”€ weights.v1.json
    β”‚   β”‚   β”‚   └── README.md
    β”‚   β”‚   β”‚
    β”‚   β”‚   └── contracts/                # NEW: Contract templates config
    β”‚   β”‚       β”œβ”€β”€ template_mappings.json
    β”‚   β”‚       β”œβ”€β”€ clause_library.json
    β”‚   β”‚       └── README.md
    β”‚   β”‚
    β”‚   β”œβ”€β”€ sports/                       # Sport-specific configs
    β”‚   β”‚   β”œβ”€β”€ archetypes.json
    β”‚   β”‚   β”œβ”€β”€ betting_markets.json
    β”‚   β”‚   β”œβ”€β”€ data_requirements.json
    β”‚   β”‚   └── README.md
    β”‚   β”‚
    β”‚   β”œβ”€β”€ pipeline/                     # Pipeline stage configs
    β”‚   β”‚   β”œβ”€β”€ onboarding_stages.json
    β”‚   β”‚   β”œβ”€β”€ validation_rules.json
    β”‚   β”‚   └── README.md
    β”‚   β”‚
    β”‚   └── README.md                     # Config governance
    β”‚
    β”œβ”€β”€ templates/                        # SCENARIO-ORGANIZED (by workflow)
    β”‚   β”œβ”€β”€ prompts/                      # AI prompt templates
    β”‚   β”‚   β”œβ”€β”€ onboarding/
    β”‚   β”‚   β”‚   β”œβ”€β”€ extract_questionnaire.j2
    β”‚   β”‚   β”‚   β”œβ”€β”€ classify_sport.j2
    β”‚   β”‚   β”‚   └── suggest_tier.j2
    β”‚   β”‚   β”‚
    β”‚   β”‚   β”œβ”€β”€ contracts/
    β”‚   β”‚   β”‚   β”œβ”€β”€ generate_terms.j2
    β”‚   β”‚   β”‚   └── assemble_document.j2
    β”‚   β”‚   β”‚
    β”‚   β”‚   β”œβ”€β”€ components/               # Reusable fragments
    β”‚   β”‚   β”‚   β”œβ”€β”€ system_instructions/
    β”‚   β”‚   β”‚   β”œβ”€β”€ output_formats/
    β”‚   β”‚   β”‚   └── few_shot/
    β”‚   β”‚   β”‚
    β”‚   β”‚   └── README.md
    β”‚   β”‚
    β”‚   └── contracts/                    # Document templates
    β”‚       β”œβ”€β”€ term_sheet.md.j2
    β”‚       β”œβ”€β”€ msa.md.j2
    β”‚       └── README.md
    β”‚
    └── examples/                         # SCENARIO-ORGANIZED (training data)
        β”œβ”€β”€ onboarding/
        β”‚   β”œβ”€β”€ questionnaire_extraction/
        β”‚   β”‚   β”œβ”€β”€ examples.jsonl        # Manual examples
        β”‚   β”‚   β”œβ”€β”€ metadata.json
        β”‚   β”‚   └── README.md
        β”‚   β”‚
        β”‚   β”œβ”€β”€ tier_classification/
        β”‚   β”‚   β”œβ”€β”€ examples.jsonl        # Manual examples
        β”‚   β”‚   β”œβ”€β”€ generated.jsonl       # ← AUTO-GENERATED from config
        β”‚   β”‚   β”œβ”€β”€ generator.py          # ← Generation script
        β”‚   β”‚   └── README.md
        β”‚   β”‚
        β”‚   └── contract_assembly/
        β”‚       β”œβ”€β”€ examples.jsonl
        β”‚       └── README.md
        β”‚
        β”œβ”€β”€ sports_classification/
        β”‚   β”œβ”€β”€ by_archetype.jsonl
        β”‚   β”œβ”€β”€ by_market_readiness.jsonl
        β”‚   └── README.md
        β”‚
        └── README.md                     # Example governance

Why Domain-Sliced Here:

  • βœ… Config naturally groups by domain (pricing, scoring)
  • βœ… Schemas already domain-organized
  • βœ… Templates group by use case (scenario)
  • βœ… Examples group by training task (scenario)

Level 2: weave/ - OPERATIONAL RUNTIME

Organization: TECHNICAL-SLICED (by system capability)

data_fabric/
└── weave/                                # All runtime operations
    β”‚
    β”œβ”€β”€ knowledge/                        # βœ… Keep structure (AI operations)
    β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”œβ”€β”€ embeddings/                   # Vector generation
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ service.py
    β”‚   β”‚   └── config.py
    β”‚   β”‚
    β”‚   β”œβ”€β”€ intent/                       # Query classification
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ classifier.py
    β”‚   β”‚   └── patterns.py
    β”‚   β”‚
    β”‚   β”œβ”€β”€ retrieval/                    # RAG operations
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ rag_service.py
    β”‚   β”‚   β”œβ”€β”€ query_builder.py
    β”‚   β”‚   └── reranker.py
    β”‚   β”‚
    β”‚   β”œβ”€β”€ storage/                      # Vector DB interface
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ langmem_client.py
    β”‚   β”‚   └── vector_store.py
    β”‚   β”‚
    β”‚   └── templates/                    # Dynamic prompt assembly
    β”‚       β”œβ”€β”€ __init__.py
    β”‚       β”œβ”€β”€ prompt_builder.py
    β”‚       └── template_loader.py
    β”‚
    β”œβ”€β”€ storage/                          # βœ… Keep structure (persistence)
    β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”œβ”€β”€ examples/                     # ⚠️ This is CODE, not data!
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ retriever.py             # Example retrieval system
    β”‚   β”‚   β”œβ”€β”€ matcher.py               # Example matching logic
    β”‚   β”‚   β”œβ”€β”€ cache.py                 # Runtime example cache
    β”‚   β”‚   └── data/                     # .gitignore runtime cache
    β”‚   β”‚
    β”‚   β”œβ”€β”€ postgres/                     # PostgreSQL operations
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ client.py
    β”‚   β”‚   └── models/
    β”‚   β”‚
    β”‚   β”œβ”€β”€ redis/                        # Cache layer
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   └── client.py
    β”‚   β”‚
    β”‚   └── supabase/                     # Supabase operations
    β”‚       β”œβ”€β”€ __init__.py
    β”‚       └── client.py
    β”‚
    β”œβ”€β”€ prompts/                          # βœ… Enhance (generation logic)
    β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”œβ”€β”€ builders/                     # Prompt construction
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ onboarding_prompts.py
    β”‚   β”‚   β”œβ”€β”€ classification_prompts.py
    β”‚   β”‚   β”œβ”€β”€ contract_prompts.py
    β”‚   β”‚   └── base.py
    β”‚   β”‚
    β”‚   β”œβ”€β”€ registry/                     # Prompt metadata
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   └── catalog.json
    β”‚   β”‚
    β”‚   └── README.md
    β”‚
    β”œβ”€β”€ generators/                       # NEW: Data generation pipelines
    β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”œβ”€β”€ config_to_examples.py         # Config β†’ Examples
    β”‚   β”œβ”€β”€ schema_to_adapters.py         # Schema β†’ Pydantic/TS
    β”‚   └── contract_assembler.py         # Data β†’ Contracts
    β”‚
    └── validators/                       # NEW: Validation logic
        β”œβ”€β”€ __init__.py
        β”œβ”€β”€ schema_validator.py
        β”œβ”€β”€ config_validator.py
        └── example_validator.py

Why Technical-Sliced Here:

  • βœ… Python modules are technical capabilities
  • βœ… Clear separation of concerns (knowledge vs storage vs generation)
  • βœ… Easy to test (mock boundaries)
  • βœ… Reusable across scenarios

Level 3: views/ - MATERIALIZED OUTPUTS

Organization: SCENARIO-BASED (by business workflow)

data_fabric/
└── views/                                # ⚠️ .gitignore entire directory
    β”‚
    β”œβ”€β”€ onboarding/                       # Onboarding pipeline outputs
    β”‚   β”œβ”€β”€ 02-ingest-validate-questionnaire/
    β”‚   β”‚   β”œβ”€β”€ example_seeds/            # Input seeds
    β”‚   β”‚   β”œβ”€β”€ validated/                # Validation results
    β”‚   β”‚   └── metadata/                 # Processing metadata
    β”‚   β”‚
    β”‚   β”œβ”€β”€ 03-enhance-documents/
    β”‚   β”‚   β”œβ”€β”€ enriched/
    β”‚   β”‚   └── metadata/
    β”‚   β”‚
    β”‚   β”œβ”€β”€ 04-classify-and-score/
    β”‚   β”‚   β”œβ”€β”€ classifications/
    β”‚   β”‚   β”œβ”€β”€ scores/
    β”‚   β”‚   └── recommendations/
    β”‚   β”‚
    β”‚   β”œβ”€β”€ 05-upsert-and-crossref/
    β”‚   β”‚   β”œβ”€β”€ upserted/
    β”‚   β”‚   └── relationships/
    β”‚   β”‚
    β”‚   β”œβ”€β”€ 06-suggest-tiers-and-terms/
    β”‚   β”‚   β”œβ”€β”€ tier_suggestions/
    β”‚   β”‚   β”œβ”€β”€ term_suggestions/
    β”‚   β”‚   └── pricing_recommendations/
    β”‚   β”‚
    β”‚   β”œβ”€β”€ 07-assemble-contract/
    β”‚   β”‚   β”œβ”€β”€ drafts/
    β”‚   β”‚   β”œβ”€β”€ final/
    β”‚   β”‚   └── metadata/
    β”‚   β”‚
    β”‚   β”œβ”€β”€ 07a-output-contract-export/
    β”‚   β”‚   β”œβ”€β”€ pdf/
    β”‚   β”‚   β”œβ”€β”€ docx/
    β”‚   β”‚   └── markdown/
    β”‚   β”‚
    β”‚   β”œβ”€β”€ 07b-output-gamekeeper-scorekeeper-ui/
    β”‚   β”‚   β”œβ”€β”€ configs/
    β”‚   β”‚   └── data/
    β”‚   β”‚
    β”‚   └── 07c-output-marketing-nxt-onboarding-materials/
    β”‚       β”œβ”€β”€ presentations/
    β”‚       └── assets/
    β”‚
    β”œβ”€β”€ analytics/                        # Analytics pipeline outputs
    β”‚   β”œβ”€β”€ reports/
    β”‚   β”œβ”€β”€ dashboards/
    β”‚   └── exports/
    β”‚
    β”œβ”€β”€ contracts/                        # Generated contracts (all workflows)
    β”‚   β”œβ”€β”€ term_sheets/
    β”‚   β”œβ”€β”€ msas/
    β”‚   └── amendments/
    β”‚
    └── uploads/                          # User-uploaded files
        β”œβ”€β”€ questionnaires/
        └── documents/

Why Scenario-Based Here:

  • βœ… Business workflows are scenarios
  • βœ… Each pipeline stage produces artifacts
  • βœ… Easy to clean up (rm -rf views/)
  • βœ… GitIgnored (don't track generated files)

πŸ“Š Comparison: Current vs Recommended

Current Structure Issues

data_fabric/
β”œβ”€β”€ output-styles/                    # ❌ Mixed metaphor
β”‚   β”œβ”€β”€ config/                       # ❌ Should be in definitions/
β”‚   β”œβ”€β”€ onboarding/                   # βœ… Good (scenario-based)
β”‚   └── schemas/                      # ❌ Duplicate of schemas/
β”‚
β”œβ”€β”€ prompts/                          # ⚠️ Mixed (templates + code)
β”‚   β”œβ”€β”€ components/                   # βœ… Should be in definitions/
β”‚   └── builders/                     # βœ… Should stay (code)
β”‚
β”œβ”€β”€ kb_catalog/                       # ⚠️ Unclear purpose
β”‚   β”œβ”€β”€ constants/                    # βœ… Good (business rules)
β”‚   └── manifests/                    # ⚠️ What's this?
β”‚
└── storage/examples/                 # ❌ Confusing (code or data?)

Problems:

  1. Mixed lifecycle stages (source + runtime + output)
  2. Duplicate schemas (schemas/ and output-styles/schemas/)
  3. Unclear metaphors ("output-styles" but has config?)
  4. Code vs data confusion (storage/examples/ is code!)

Recommended Structure Benefits

data_fabric/
β”œβ”€β”€ definitions/                      # βœ… Clear: "source of truth"
β”‚   β”œβ”€β”€ schemas/                      # βœ… Only place for schemas
β”‚   β”œβ”€β”€ config/                       # βœ… Only place for business config
β”‚   β”œβ”€β”€ templates/                    # βœ… Only place for templates
β”‚   └── examples/                     # βœ… Only place for training data
β”‚
β”œβ”€β”€ weave/                           # βœ… Clear: "operational code"
β”‚   β”œβ”€β”€ knowledge/                    # βœ… AI operations
β”‚   β”œβ”€β”€ storage/                      # βœ… Persistence operations
β”‚   β”œβ”€β”€ prompts/                      # βœ… Generation code
β”‚   └── generators/                   # βœ… Transformation code
β”‚
└── views/                           # βœ… Clear: "generated outputs"
    β”œβ”€β”€ onboarding/                   # βœ… Scenario-based
    └── analytics/                    # βœ… Scenario-based

Benefits:

  1. βœ… Single source of truth (no duplicates)
  2. βœ… Clear lifecycle (definitions β†’ weave β†’ views)
  3. βœ… Git-friendly (track definitions, ignore views)
  4. βœ… Domain-sliced where it matters (config, schemas)
  5. βœ… Scenario-sliced where it matters (pipelines, examples)

πŸ”„ Migration Strategy

Phase 1: Non-Breaking Additions (Week 1)

# Create new structure without deleting old
mkdir -p data_fabric/definitions/{schemas,config,templates,examples}
mkdir -p data_fabric/definitions/config/{business,sports,pipeline}
mkdir -p data_fabric/definitions/templates/{prompts,contracts}
mkdir -p data_fabric/definitions/examples/onboarding
 
mkdir -p data_fabric/weave/{knowledge,storage,prompts,generators,validators}
 
mkdir -p data_fabric/views/{onboarding,analytics,contracts,uploads}

Phase 2: Copy (Don't Move) Critical Files (Week 1)

# Config files (keep originals as backup)
cp -r data_fabric/output-styles/config/business/* data_fabric/definitions/config/business/
 
# Prompt templates
cp -r data_fabric/prompts/components/* data_fabric/definitions/templates/prompts/components/
 
# Examples (if any exist outside storage/)
# ... identify and copy

Phase 3: Update Import Paths (Week 2)

# OLD
from database.output_styles.config.business.pricing import tier_presets
 
# NEW
from data_fabric.definitions.config.business.pricing import tier_presets
# Find all references
grep -r "output_styles.config" data_fabric/ --include="*.py"
grep -r "from database" data_fabric/ --include="*.py"
 
# Automated replacement
find data_fabric -name "*.py" -type f -exec sed -i '' \
  's/from database\.output_styles\.config/from data_fabric.definitions.config/g' {} +

Phase 4: Test & Validate (Week 2)

# Run all tests
python -m pytest data_fabric/tests/
 
# Validate imports
python -c "from data_fabric.definitions.config.business.pricing import tier_presets"
 
# Check for broken imports
python scripts/check_imports.py

Phase 5: Delete Old Structure (Week 3)

# Only after confirming everything works!
git rm -r data_fabric/output-styles/config/
git rm -r data_fabric/prompts/components/  # Move to definitions/templates
 
# Update .gitignore
echo "data_fabric/views/*" >> .gitignore
echo "!data_fabric/views/README.md" >> .gitignore

🎯 Special Considerations

1. kb_catalog/ - Where Does It Go?

Current Location: Top-level (unclear)

Options:

Option A: Merge into definitions/config/

definitions/
└── config/
    β”œβ”€β”€ business/         # Business rules
    β”œβ”€β”€ sports/           # Sports config
    └── system/           # NEW: System-level config
        β”œβ”€β”€ constants.py  # ← FROM kb_catalog/constants/
        └── registry.json # ← FROM kb_catalog/registry/

Option B: Keep as definitions/catalog/

definitions/
β”œβ”€β”€ config/              # Operational config
└── catalog/            # System inventory
    β”œβ”€β”€ constants/       # Enum-like data
    β”œβ”€β”€ registry/        # Component registry
    └── manifests/       # Auto-generated inventories

Recommendation: Option B if catalog is auto-generated inventory.
Rationale: Catalogs are metadata ABOUT the system, not config FOR the system.


2. storage/examples/ - Code or Data?

Current Reality: It's CODE (retriever.py, matcher.py)

Decision: Keep in weave/storage/examples/ as a code module

Clarify with README:

# weave/storage/examples/README.md
 
This is a **Python module** for runtime example retrieval, NOT a data directory.
 
Training examples live in: `data_fabric/definitions/examples/`

3. Generated Schemas - Where?

Current: schemas/generated/
Proposed: definitions/schemas/generated/

Rationale: Generated FROM canonical, so still "definitions"

Alternative View: Move to views/schemas/ since they're derived

Recommendation: Keep in definitions/schemas/generated/

  • These are source code (imported by apps)
  • They're checked into git (not gitignored)
  • They're versioned (breaking changes matter)

4. Pipeline Stage Configs - Where?

Question: Should each pipeline stage have its own config?

Current: Global config in output-styles/config/

Recommendation: Centralized in definitions/config/

definitions/
└── config/
    β”œβ”€β”€ business/          # Domain config (pricing, scoring)
    β”œβ”€β”€ pipeline/          # Pipeline-wide settings
    β”‚   β”œβ”€β”€ onboarding_stages.json
    β”‚   └── validation_rules.json
    └── sports/            # Sport-specific config

Rationale:

  • βœ… Single source of truth
  • βœ… Easier to version
  • βœ… Avoids duplication across stages
  • βœ… Pipeline stages READ config, don't OWN it

πŸ“ Final Recommendation Summary

βœ… Organization Strategy: HYBRID

  • Level 1 (Lifecycle): definitions/ β†’ weave/ β†’ views/
  • Level 2 (Within definitions/): Domain-sliced (pricing, scoring, sports)
  • Level 3 (Within views/): Scenario-sliced (onboarding, analytics)

βœ… Directory Structure

data_fabric/
β”œβ”€β”€ definitions/         # Lifecycle Stage 1: Source of truth
β”‚   β”œβ”€β”€ schemas/         # Domain-organized
β”‚   β”œβ”€β”€ config/          # Domain-organized (business, sports, pipeline)
β”‚   β”œβ”€β”€ templates/       # Scenario-organized (prompts, contracts)
β”‚   └── examples/        # Scenario-organized (training data)
β”‚
β”œβ”€β”€ weave/              # Lifecycle Stage 2: Runtime operations
β”‚   β”œβ”€β”€ knowledge/       # Technical slice (AI)
β”‚   β”œβ”€β”€ storage/         # Technical slice (persistence)
β”‚   β”œβ”€β”€ prompts/         # Technical slice (generation)
β”‚   β”œβ”€β”€ generators/      # Technical slice (transformation)
β”‚   └── validators/      # Technical slice (validation)
β”‚
β”œβ”€β”€ views/              # Lifecycle Stage 3: Generated outputs
β”‚   β”œβ”€β”€ onboarding/      # Scenario-organized
β”‚   β”œβ”€β”€ analytics/       # Scenario-organized
β”‚   β”œβ”€β”€ contracts/       # Scenario-organized
β”‚   └── uploads/         # Scenario-organized
β”‚
β”œβ”€β”€ docs/               # Documentation (lifecycle-agnostic)
β”œβ”€β”€ scripts/            # Utilities (lifecycle-agnostic)
└── tests/              # Testing (lifecycle-agnostic)

βœ… Migration Priority

  1. Week 1: Create structure, copy (don't move) files
  2. Week 2: Update imports, test thoroughly
  3. Week 3: Delete old structure, update docs

βœ… Why This Works

ConcernSolution
"Where does X go?"Lifecycle first β†’ domain/scenario second
"Too many directories"Only 3 top-level (definitions, weave, views)
"Hard to navigate"IDE search + clear README in each
"Breaking changes"Copy-then-migrate strategy
"Team confusion"Visual diagram + onboarding doc

πŸŽ“ Teaching the System

Create: data_fabric/README.md

# Data Fabric Architecture
 
This directory uses a **hybrid lifecycle + domain organization**.
 
## πŸ—‚οΈ Top-Level Structure
 
- `definitions/` - **Source of truth** (git-tracked, immutable)
- `weave/` - **Operational code** (Python modules, runtime logic)
- `views/` - **Generated outputs** (.gitignored, materialized views)
 
## 🧭 Finding What You Need
 
**Looking for business rules?** β†’ `definitions/config/business/`
**Looking for AI prompts?** β†’ `definitions/templates/prompts/`
**Looking for schemas?** β†’ `definitions/schemas/domain/`
**Looking for runtime code?** β†’ `weave/{knowledge,storage,prompts}/`
**Looking for pipeline outputs?** β†’ `views/onboarding/`
 
## πŸ“š Learn More
 
- [Lifecycle Guide](docs/LIFECYCLE_GUIDE.md)
- [Domain Guide](docs/DOMAIN_GUIDE.md)
- [Scenario Guide](docs/SCENARIO_GUIDE.md)

Bottom Line: Use HYBRID organization with lifecycle at the top level, domain slicing for config/schemas, and scenario slicing for workflows/examples.

Platform

Documentation

Community

Support

partnership@altsportsdata.comdev@altsportsleagues.ai

2025 Β© AltSportsLeagues.ai. Powered by AI-driven sports business intelligence.

πŸ€– AI-Enhancedβ€’πŸ“Š Data-Drivenβ€’βš‘ Real-Time