Vector Database Integration

Our intelligent documentation search is powered by a vector database that embeds all our knowledge sources—from technical specs to business intelligence—for fast, accurate, and context-aware semantic search.

The vector database is managed by our DocsCrewImprover library and is populated by the ingest-docs script.

Ingestion Pipeline

The process of converting our documentation into a searchable vector index is handled by an automated pipeline.

1. Source Collection

The DocsCrewImprover class is configured to gather documents from multiple high-priority sources across the monorepo:

// From: apps/docs-site/lib/docs-crew-improver.ts
this.sources = [
  { name: 'kiro-specs', path: '../../../.kiro/specs', type: 'specs', priority: 10 },
  { name: 'shared-schemas', path: '../../../data_layers/shared', type: 'shared', priority: 9 },
  { name: 'claude-templates', path: '../../../.claude', type: 'claude', priority: 8 },
  { name: 'project-docs', path: '../../../docs', type: 'project', priority: 7 },
  { name: 'api-schemas', path: '../../../data_layer/schemas', type: 'api', priority: 6 }
];

2. Document Processing

Each document is read, processed, and enriched with metadata:

Text Splitting: Large documents are split into smaller, semantically coherent chunks using RecursiveCharacterTextSplitter.
Metadata Enrichment: Each chunk is tagged with metadata, including source, type, priority, complexity, and contextual information related to our Claude.ai integration.

3. Vector Embedding

The processed text chunks are converted into numerical vectors using OpenAI's text-embedding-3-large model. This allows us to perform semantic similarity searches.

4. Storage

The embeddings and their corresponding metadata are stored in a local ChromaDB instance located at .chroma_docs_improved.

Running the Ingestion Script

To update the vector database with the latest documentation, run the ingestion script from the apps/docs-site directory:

npm run ingest-docs

This command executes the scripts/ingest-docs.ts file, which orchestrates the entire pipeline.

Script Workflow:

Initializes docsCrewImprover.
Calls ingestAllSources() to process and embed documents from all configured locations.
(Future) Calls generateProjectDocumentation() to create AI-generated summaries.
Outputs system statistics, including the total number of documents indexed.

⚠️

You must run this script any time you make significant changes to the source documentation files in .kiro/specs, data_layers/shared, etc., to ensure the search results remain up-to-date.

Technical Details

Vector Store: ChromaDB (opens in a new tab)
Embedding Model: OpenAI text-embedding-3-large
Orchestration: Langchain.js
Location: The database is stored locally within the apps/docs-site directory at .chroma_docs_improved. It should be added to .gitignore to avoid checking it into version control.

Intelligent Search Backend API & MCP Integration

Vector Database Integration

Ingestion Pipeline

1. Source Collection

2. Document Processing

3. Vector Embedding

4. Storage

Running the Ingestion Script

Technical Details

Platform

Documentation

Community

Support