Vector Database Integration
Our intelligent documentation search is powered by a vector database that embeds all our knowledge sources—from technical specs to business intelligence—for fast, accurate, and context-aware semantic search.
The vector database is managed by our DocsCrewImprover library and is populated by the ingest-docs script.
Ingestion Pipeline
The process of converting our documentation into a searchable vector index is handled by an automated pipeline.
1. Source Collection
The DocsCrewImprover class is configured to gather documents from multiple high-priority sources across the monorepo:
// From: apps/docs-site/lib/docs-crew-improver.ts
this.sources = [
{ name: 'kiro-specs', path: '../../../.kiro/specs', type: 'specs', priority: 10 },
{ name: 'shared-schemas', path: '../../../data_layers/shared', type: 'shared', priority: 9 },
{ name: 'claude-templates', path: '../../../.claude', type: 'claude', priority: 8 },
{ name: 'project-docs', path: '../../../docs', type: 'project', priority: 7 },
{ name: 'api-schemas', path: '../../../data_layer/schemas', type: 'api', priority: 6 }
];2. Document Processing
Each document is read, processed, and enriched with metadata:
- Text Splitting: Large documents are split into smaller, semantically coherent chunks using
RecursiveCharacterTextSplitter. - Metadata Enrichment: Each chunk is tagged with metadata, including source, type, priority, complexity, and contextual information related to our Claude.ai integration.
3. Vector Embedding
The processed text chunks are converted into numerical vectors using OpenAI's text-embedding-3-large model. This allows us to perform semantic similarity searches.
4. Storage
The embeddings and their corresponding metadata are stored in a local ChromaDB instance located at .chroma_docs_improved.
Running the Ingestion Script
To update the vector database with the latest documentation, run the ingestion script from the apps/docs-site directory:
npm run ingest-docsThis command executes the scripts/ingest-docs.ts file, which orchestrates the entire pipeline.
Script Workflow:
- Initializes
docsCrewImprover. - Calls
ingestAllSources()to process and embed documents from all configured locations. - (Future) Calls
generateProjectDocumentation()to create AI-generated summaries. - Outputs system statistics, including the total number of documents indexed.
You must run this script any time you make significant changes to the source documentation files in .kiro/specs, data_layers/shared, etc., to ensure the search results remain up-to-date.
Technical Details
- Vector Store: ChromaDB (opens in a new tab)
- Embedding Model: OpenAI
text-embedding-3-large - Orchestration: Langchain.js
- Location: The database is stored locally within the
apps/docs-sitedirectory at.chroma_docs_improved. It should be added to.gitignoreto avoid checking it into version control.