Source: data_layer/docs/NAMING_STRATEGY.md
๐ฏ Naming Strategy: Data Architecture
Date: 2025-01-16
Purpose: Establish consistent, semantic naming conventions for data architecture layers
๐ Current Situation Analysis
Your Existing Metaphors
database/- Current top-level directorySOURCE_OF_TRUTH/- Proposed canonical data layeroutput-styles/- Generated/derived outputsprompts/,storage/,knowledge/- Operational layers
The Semantic Problem
- "database" implies a managed data storage system (PostgreSQL, Redis, etc.)
- But your directory contains:
- โ Schemas & configs (canonical data)
- โ Python modules (operational code)
- โ Prompt templates (generation logic)
- โ Examples & training data (reference materials)
This is not a "database"โit's a DATA PLATFORM.
๐จ Naming Options Analysis
Option 1: data/ โญโญโญ
Industry Standard Name
data/
โโโ source/ # Canonical definitions
โโโ runtime/ # Operational modules
โโโ generated/ # Pipeline outputsPros:
- โ Universal convention in ML/AI
- โ Simple, clear, widely understood
- โ Works with existing tools/frameworks
- โ Low cognitive load
Cons:
- โ Too genericโdoesn't convey sophistication
- โ Doesn't reflect multi-layer architecture
- โ Might imply "just data files"
Best For: Traditional ML projects, data science workflows
Option 2: data_layer/ โญโญโญโญ
Architectural Pattern Name
data_layer/
โโโ canonical/ # Source of truth
โโโ operational/ # Runtime services
โโโ materialized/ # Generated viewsPros:
- โ Communicates architectural thinking
- โ Implies separation of concerns
- โ Familiar to backend engineers
- โ Scales conceptually (presentation layer, logic layer, data layer)
Cons:
- โ Slightly longer name
- โ "Layer" might imply only one responsibility
Best For: Architecturally sophisticated systems with clear layer boundaries
Option 3: data_fabric/ โญโญโญโญโญ
Modern Data Architecture Pattern
data_fabric/
โโโ definitions/ # Canonical schemas & configs
โโโ weave/ # Integration & transformation logic
โโโ views/ # Materialized outputsDefinition of Data Fabric:
"A data fabric is an architecture that facilitates the end-to-end integration of various data pipelines and cloud environments through the use of intelligent and automated systems." (IBM, 2024 (opens in a new tab))
Key Characteristics:
- Unified Access: Single interface to heterogeneous data sources
- Active Metadata: Intelligent understanding of data relationships
- Knowledge Graph: Semantic connections between data entities
- Automation: Self-service data access and governance
Pros:
- โ Perfect semantic match for your architecture
- โ
You literally have:
- Multiple storage backends (PostgreSQL, Redis, Vector DB)
- Intelligent metadata (schemas, configs, examples)
- Knowledge graph operations (embeddings, RAG)
- Automated generation pipelines
- โ Modern, sophisticated terminology
- โ Communicates integration & orchestration
- โ Metaphorically rich ("fabric" = woven together)
Cons:
- โ Less universally known term
- โ Might require explanation for junior devs
- โ Could be seen as "buzzword-y"
Best For: Modern AI/ML platforms with:
- Multi-storage strategies
- Automated data pipelines
- Semantic understanding layers
- RAG/vector operations
Option 4: data_platform/ โญโญโญโญ
Product/Service Oriented Name
data_platform/
โโโ catalog/ # Data registry
โโโ services/ # Operational APIs
โโโ artifacts/ # Generated assetsPros:
- โ Business-friendly terminology
- โ Implies productized capabilities
- โ Communicates value, not just structure
- โ Good for stakeholder communication
Cons:
- โ Might imply more infrastructure than exists
- โ "Platform" could be misleading at current scale
Best For: Data products, internal tooling, SaaS offerings
๐ Recommendation: data_fabric/
Why Data Fabric Wins
Your system literally is a data fabric:
Your Implementation Matches Data Fabric Principles:
-
Unified Access Pattern โ
- Single directory structure
- Consistent APIs across storage backends
- Abstracted access patterns
-
Active Metadata Management โ
- JSON Schemas as active definitions
- Auto-generated adapters (Pydantic, TypeScript, Drizzle)
- Version-controlled configurations
-
Knowledge Graph Operations โ
- Vector embeddings in
knowledge/ - Semantic retrieval via RAG
- Intent classification and routing
- Vector embeddings in
-
Automated Orchestration โ
- Config โ Example generation
- Schema โ Adapter generation
- Source โ Runtime deployment
๐ Proposed Final Structure
data_fabric/ # The unified data architecture
โ
โโโ definitions/ # Canonical source of truth
โ โโโ schemas/ # JSON Schema (canonical)
โ โโโ configs/ # Business rules & presets
โ โโโ templates/ # Prompt templates
โ โโโ examples/ # Training examples (JSONL)
โ
โโโ weave/ # Integration & transformation
โ โโโ knowledge/ # Embeddings, RAG, retrieval
โ โโโ storage/ # Multi-backend abstractions
โ โโโ prompts/ # Dynamic prompt builders
โ โโโ generators/ # Config โ Example pipelines
โ
โโโ views/ # Materialized/generated outputs
โ โโโ onboarding/ # Pipeline outputs
โ โโโ contracts/ # Generated contracts
โ โโโ analytics/ # Computed views
โ
โโโ README.md # Architecture overviewSemantic Clarity
| Layer | Purpose | Metaphor |
|---|---|---|
definitions/ | Source of truth | "The thread" |
weave/ | Integration logic | "The loom" |
views/ | Materialized outputs | "The fabric" |
๐ Alternative: Stick with database/ + Add Context
If changing the name is too disruptive, you could:
database/ # Keep existing name
โโโ _ARCHITECTURE.md # NEW: Explain it's a data fabric
โโโ canonical/ # Rename: SOURCE_OF_TRUTH
โโโ operational/ # Group: weave/ contents
โโโ materialized/ # Group: views/ contentsPros:
- โ No breaking changes
- โ Maintains git history
- โ Less migration work
Cons:
- โ Perpetuates semantic confusion
- โ Doesn't signal architectural sophistication
- โ New team members might misunderstand
๐ฏ Migration Path (If Choosing data_fabric/)
Phase 1: Rename Directory (Low Risk)
git mv database data_fabric
# Update all import paths
find . -type f -name "*.py" -exec sed -i 's/from database/from data_fabric/g' {} +Phase 2: Restructure Internal Layout
cd data_fabric
mkdir -p definitions/schemas definitions/configs definitions/templates
mkdir -p weave/knowledge weave/storage weave/prompts
mkdir -p views/onboarding views/contracts
# Move existing files to new locationsPhase 3: Update Documentation
- Update all README files
- Regenerate architecture diagrams
- Update import statements in examples
Estimated Effort: 2-4 hours
Risk Level: Low (mostly file moves)
Breaking Changes: Import paths only
๐ Decision Matrix
| Criteria | data/ | data_layer/ | data_fabric/ | data_platform/ |
|---|---|---|---|---|
| Semantic Accuracy | โญโญโญ | โญโญโญโญ | โญโญโญโญโญ | โญโญโญโญ |
| Industry Recognition | โญโญโญโญโญ | โญโญโญโญ | โญโญโญ | โญโญโญโญ |
| Future-Proofing | โญโญโญ | โญโญโญโญ | โญโญโญโญโญ | โญโญโญโญ |
| Team Onboarding | โญโญโญโญโญ | โญโญโญโญ | โญโญโญ | โญโญโญโญ |
| Metaphor Richness | โญโญ | โญโญโญ | โญโญโญโญโญ | โญโญโญโญ |
| TOTAL | 16/25 | 19/25 | 21/25 | 19/25 |
๐ค Final Recommendation
Choose data_fabric/ If:
- โ You want to signal architectural sophistication
- โ Your system truly integrates multiple data sources
- โ You're building for scale and complexity
- โ Team is technically mature
Choose data_layer/ If:
- โ You want familiar, safe terminology
- โ You prioritize simplicity over precision
- โ Team includes junior developers
- โ You want broad, immediate recognition
Stick with database/ If:
- โ Migration effort is too high right now
- โ Git history preservation is critical
- โ External integrations reference this path
- โ You add clarifying documentation
๐ Subdirectory Naming (With data_fabric/)
Instead of SOURCE_OF_TRUTH/, use definitions/
Rationale:
- Shorter, more elegant
- Industry-standard term
- Pairs well with "data fabric" metaphor
- Implies "defining characteristics" of the data
Instead of Mixed Names, use Lifecycle Terms
definitions/ # What the data IS (canonical schemas, configs)
weave/ # How the data FLOWS (integration, transformation)
views/ # What the data BECOMES (materialized, generated)Metaphor Consistency:
- Definitions = The thread (raw material)
- Weave = The loom (transformation process)
- Views = The fabric (finished product)
๐ฎ Future Considerations
If you adopt data_fabric/:
-
Next Addition:
data_fabric/catalog/- Data lineage tracking
- Data quality metrics
- Schema registry interface
-
Next Addition:
data_fabric/governance/- Access control policies
- Data retention rules
- Compliance documentation
-
Next Addition:
data_fabric/observability/- Data flow monitoring
- Quality dashboards
- Performance metrics
This sets you up for true Data Fabric capabilities long-term.
๐ TL;DR
Recommended: data_fabric/ with subdirectories:
definitions/(notSOURCE_OF_TRUTH/)weave/(operational logic)views/(materialized outputs)
Why: Your architecture literally is a data fabricโunified access, active metadata, knowledge graph operations, and automated orchestration across multiple storage backends.
Migration Effort: 2-4 hours (mostly imports)
Alternative: Keep database/ but add _ARCHITECTURE.md explaining it's a data fabric implementation.