Architecture
๐ŸŽฏ Naming Strategy: Data Architecture

Source: data_layer/docs/NAMING_STRATEGY.md

๐ŸŽฏ Naming Strategy: Data Architecture

Date: 2025-01-16
Purpose: Establish consistent, semantic naming conventions for data architecture layers


๐Ÿ“Š Current Situation Analysis

Your Existing Metaphors

  1. database/ - Current top-level directory
  2. SOURCE_OF_TRUTH/ - Proposed canonical data layer
  3. output-styles/ - Generated/derived outputs
  4. prompts/, storage/, knowledge/ - Operational layers

The Semantic Problem

  • "database" implies a managed data storage system (PostgreSQL, Redis, etc.)
  • But your directory contains:
    • โœ… Schemas & configs (canonical data)
    • โœ… Python modules (operational code)
    • โœ… Prompt templates (generation logic)
    • โœ… Examples & training data (reference materials)

This is not a "database"โ€”it's a DATA PLATFORM.


๐ŸŽจ Naming Options Analysis

Option 1: data/ โญโญโญ

Industry Standard Name

data/
โ”œโ”€โ”€ source/              # Canonical definitions
โ”œโ”€โ”€ runtime/             # Operational modules
โ””โ”€โ”€ generated/           # Pipeline outputs

Pros:

  • โœ… Universal convention in ML/AI
  • โœ… Simple, clear, widely understood
  • โœ… Works with existing tools/frameworks
  • โœ… Low cognitive load

Cons:

  • โŒ Too genericโ€”doesn't convey sophistication
  • โŒ Doesn't reflect multi-layer architecture
  • โŒ Might imply "just data files"

Best For: Traditional ML projects, data science workflows


Option 2: data_layer/ โญโญโญโญ

Architectural Pattern Name

data_layer/
โ”œโ”€โ”€ canonical/           # Source of truth
โ”œโ”€โ”€ operational/         # Runtime services
โ””โ”€โ”€ materialized/        # Generated views

Pros:

  • โœ… Communicates architectural thinking
  • โœ… Implies separation of concerns
  • โœ… Familiar to backend engineers
  • โœ… Scales conceptually (presentation layer, logic layer, data layer)

Cons:

  • โŒ Slightly longer name
  • โŒ "Layer" might imply only one responsibility

Best For: Architecturally sophisticated systems with clear layer boundaries


Option 3: data_fabric/ โญโญโญโญโญ

Modern Data Architecture Pattern

data_fabric/
โ”œโ”€โ”€ definitions/         # Canonical schemas & configs
โ”œโ”€โ”€ weave/              # Integration & transformation logic
โ””โ”€โ”€ views/              # Materialized outputs

Definition of Data Fabric:

"A data fabric is an architecture that facilitates the end-to-end integration of various data pipelines and cloud environments through the use of intelligent and automated systems." (IBM, 2024 (opens in a new tab))

Key Characteristics:

  1. Unified Access: Single interface to heterogeneous data sources
  2. Active Metadata: Intelligent understanding of data relationships
  3. Knowledge Graph: Semantic connections between data entities
  4. Automation: Self-service data access and governance

Pros:

  • โœ… Perfect semantic match for your architecture
  • โœ… You literally have:
    • Multiple storage backends (PostgreSQL, Redis, Vector DB)
    • Intelligent metadata (schemas, configs, examples)
    • Knowledge graph operations (embeddings, RAG)
    • Automated generation pipelines
  • โœ… Modern, sophisticated terminology
  • โœ… Communicates integration & orchestration
  • โœ… Metaphorically rich ("fabric" = woven together)

Cons:

  • โŒ Less universally known term
  • โŒ Might require explanation for junior devs
  • โŒ Could be seen as "buzzword-y"

Best For: Modern AI/ML platforms with:

  • Multi-storage strategies
  • Automated data pipelines
  • Semantic understanding layers
  • RAG/vector operations

Option 4: data_platform/ โญโญโญโญ

Product/Service Oriented Name

data_platform/
โ”œโ”€โ”€ catalog/            # Data registry
โ”œโ”€โ”€ services/           # Operational APIs
โ””โ”€โ”€ artifacts/          # Generated assets

Pros:

  • โœ… Business-friendly terminology
  • โœ… Implies productized capabilities
  • โœ… Communicates value, not just structure
  • โœ… Good for stakeholder communication

Cons:

  • โŒ Might imply more infrastructure than exists
  • โŒ "Platform" could be misleading at current scale

Best For: Data products, internal tooling, SaaS offerings


๐Ÿ† Recommendation: data_fabric/

Why Data Fabric Wins

Your system literally is a data fabric:

Your Implementation Matches Data Fabric Principles:

  1. Unified Access Pattern โœ…

    • Single directory structure
    • Consistent APIs across storage backends
    • Abstracted access patterns
  2. Active Metadata Management โœ…

    • JSON Schemas as active definitions
    • Auto-generated adapters (Pydantic, TypeScript, Drizzle)
    • Version-controlled configurations
  3. Knowledge Graph Operations โœ…

    • Vector embeddings in knowledge/
    • Semantic retrieval via RAG
    • Intent classification and routing
  4. Automated Orchestration โœ…

    • Config โ†’ Example generation
    • Schema โ†’ Adapter generation
    • Source โ†’ Runtime deployment

๐Ÿ“ Proposed Final Structure

data_fabric/                              # The unified data architecture
โ”‚
โ”œโ”€โ”€ definitions/                          # Canonical source of truth
โ”‚   โ”œโ”€โ”€ schemas/                          # JSON Schema (canonical)
โ”‚   โ”œโ”€โ”€ configs/                          # Business rules & presets
โ”‚   โ”œโ”€โ”€ templates/                        # Prompt templates
โ”‚   โ””โ”€โ”€ examples/                         # Training examples (JSONL)
โ”‚
โ”œโ”€โ”€ weave/                                # Integration & transformation
โ”‚   โ”œโ”€โ”€ knowledge/                        # Embeddings, RAG, retrieval
โ”‚   โ”œโ”€โ”€ storage/                          # Multi-backend abstractions
โ”‚   โ”œโ”€โ”€ prompts/                          # Dynamic prompt builders
โ”‚   โ””โ”€โ”€ generators/                       # Config โ†’ Example pipelines
โ”‚
โ”œโ”€โ”€ views/                                # Materialized/generated outputs
โ”‚   โ”œโ”€โ”€ onboarding/                       # Pipeline outputs
โ”‚   โ”œโ”€โ”€ contracts/                        # Generated contracts
โ”‚   โ””โ”€โ”€ analytics/                        # Computed views
โ”‚
โ””โ”€โ”€ README.md                             # Architecture overview

Semantic Clarity

LayerPurposeMetaphor
definitions/Source of truth"The thread"
weave/Integration logic"The loom"
views/Materialized outputs"The fabric"

๐Ÿ”„ Alternative: Stick with database/ + Add Context

If changing the name is too disruptive, you could:

database/                                 # Keep existing name
โ”œโ”€โ”€ _ARCHITECTURE.md                      # NEW: Explain it's a data fabric
โ”œโ”€โ”€ canonical/                            # Rename: SOURCE_OF_TRUTH
โ”œโ”€โ”€ operational/                          # Group: weave/ contents
โ””โ”€โ”€ materialized/                         # Group: views/ contents

Pros:

  • โœ… No breaking changes
  • โœ… Maintains git history
  • โœ… Less migration work

Cons:

  • โŒ Perpetuates semantic confusion
  • โŒ Doesn't signal architectural sophistication
  • โŒ New team members might misunderstand

๐ŸŽฏ Migration Path (If Choosing data_fabric/)

Phase 1: Rename Directory (Low Risk)

git mv database data_fabric
# Update all import paths
find . -type f -name "*.py" -exec sed -i 's/from database/from data_fabric/g' {} +

Phase 2: Restructure Internal Layout

cd data_fabric
mkdir -p definitions/schemas definitions/configs definitions/templates
mkdir -p weave/knowledge weave/storage weave/prompts
mkdir -p views/onboarding views/contracts
# Move existing files to new locations

Phase 3: Update Documentation

  • Update all README files
  • Regenerate architecture diagrams
  • Update import statements in examples

Estimated Effort: 2-4 hours
Risk Level: Low (mostly file moves)
Breaking Changes: Import paths only


๐Ÿ“Š Decision Matrix

Criteriadata/data_layer/data_fabric/data_platform/
Semantic Accuracyโญโญโญโญโญโญโญโญโญโญโญโญโญโญโญโญ
Industry Recognitionโญโญโญโญโญโญโญโญโญโญโญโญโญโญโญโญ
Future-Proofingโญโญโญโญโญโญโญโญโญโญโญโญโญโญโญโญ
Team Onboardingโญโญโญโญโญโญโญโญโญโญโญโญโญโญโญโญ
Metaphor Richnessโญโญโญโญโญโญโญโญโญโญโญโญโญโญ
TOTAL16/2519/2521/2519/25

๐ŸŽค Final Recommendation

Choose data_fabric/ If:

  • โœ… You want to signal architectural sophistication
  • โœ… Your system truly integrates multiple data sources
  • โœ… You're building for scale and complexity
  • โœ… Team is technically mature

Choose data_layer/ If:

  • โœ… You want familiar, safe terminology
  • โœ… You prioritize simplicity over precision
  • โœ… Team includes junior developers
  • โœ… You want broad, immediate recognition

Stick with database/ If:

  • โœ… Migration effort is too high right now
  • โœ… Git history preservation is critical
  • โœ… External integrations reference this path
  • โœ… You add clarifying documentation

๐Ÿ“ Subdirectory Naming (With data_fabric/)

Instead of SOURCE_OF_TRUTH/, use definitions/

Rationale:

  • Shorter, more elegant
  • Industry-standard term
  • Pairs well with "data fabric" metaphor
  • Implies "defining characteristics" of the data

Instead of Mixed Names, use Lifecycle Terms

definitions/   # What the data IS (canonical schemas, configs)
weave/         # How the data FLOWS (integration, transformation)
views/         # What the data BECOMES (materialized, generated)

Metaphor Consistency:

  • Definitions = The thread (raw material)
  • Weave = The loom (transformation process)
  • Views = The fabric (finished product)

๐Ÿ”ฎ Future Considerations

If you adopt data_fabric/:

  1. Next Addition: data_fabric/catalog/

    • Data lineage tracking
    • Data quality metrics
    • Schema registry interface
  2. Next Addition: data_fabric/governance/

    • Access control policies
    • Data retention rules
    • Compliance documentation
  3. Next Addition: data_fabric/observability/

    • Data flow monitoring
    • Quality dashboards
    • Performance metrics

This sets you up for true Data Fabric capabilities long-term.


๐Ÿ TL;DR

Recommended: data_fabric/ with subdirectories:

  • definitions/ (not SOURCE_OF_TRUTH/)
  • weave/ (operational logic)
  • views/ (materialized outputs)

Why: Your architecture literally is a data fabricโ€”unified access, active metadata, knowledge graph operations, and automated orchestration across multiple storage backends.

Migration Effort: 2-4 hours (mostly imports)

Alternative: Keep database/ but add _ARCHITECTURE.md explaining it's a data fabric implementation.

Platform

Documentation

Community

Support

partnership@altsportsdata.comdev@altsportsleagues.ai

2025 ยฉ AltSportsLeagues.ai. Powered by AI-driven sports business intelligence.

๐Ÿค– AI-Enhancedโ€ข๐Ÿ“Š Data-Drivenโ€ขโšก Real-Time