Embedding Drift Detection
Detect when an embedding model changes and re-embed documents to maintain search quality.
Overview
When you switch embedding models (e.g., from MiniLM-L6 to BGE-small-en), all existing vectors become incompatible -- even if both models produce the same dimensionality. Cosine similarity between vectors from different models produces nonsensical scores.
See it in action
Try Semantic Search for a working demo of these APIs.
LocalMode tracks model provenance per collection via ModelFingerprint. On initialization, the stored fingerprint is compared against the current model. If they differ, a modelDriftDetected event fires so you can take action.
ModelFingerprint
A ModelFingerprint captures the identity of the embedding model that produced a collection's vectors:
interface ModelFingerprint {
modelId: string; // e.g., 'Xenova/bge-small-en-v1.5'
provider: string; // e.g., 'transformers'
dimensions: number; // e.g., 384
}The fingerprint is derived automatically from EmbeddingModel.modelId, EmbeddingModel.provider, and EmbeddingModel.dimensions -- no changes to the model interface are needed.
Enabling Drift Detection
Pass your embedding model when creating a VectorDB:
import { createVectorDB } from '@localmode/core';
import { transformers } from '@localmode/transformers';
const model = transformers.embedding('Xenova/bge-small-en-v1.5');
const db = await createVectorDB({
name: 'my-docs',
dimensions: 384,
model, // Enables drift detection
});On first use, the model's fingerprint is stored with the collection. On subsequent initializations, the stored fingerprint is compared to the current model.
Checking Compatibility
Use checkModelCompatibility() for a read-only check that does not modify storage or emit events:
import { checkModelCompatibility } from '@localmode/core';
const result = await checkModelCompatibility(db, newModel);
switch (result.status) {
case 'compatible':
console.log('Models match -- no action needed');
break;
case 'incompatible':
console.log(`Model changed: ${result.storedModel?.modelId} -> ${result.currentModel.modelId}`);
console.log(`${result.documentCount} documents may need re-embedding`);
break;
case 'dimension-mismatch':
console.log('Dimension mismatch -- cannot use this model with existing data');
break;
}ModelCompatibilityResult
| Field | Type | Description |
|---|---|---|
status | 'compatible' | 'incompatible' | 'dimension-mismatch' | Compatibility status |
storedModel | ModelFingerprint | null | Stored fingerprint (null if none stored) |
currentModel | ModelFingerprint | Current model's fingerprint |
documentCount | number | Number of documents in the collection |
Reindexing
When drift is detected, use reindexCollection() to re-embed all documents with the new model:
import { reindexCollection } from '@localmode/core';
const result = await reindexCollection(db, newModel, {
batchSize: 32,
onProgress: ({ completed, total, skipped, phase }) => {
console.log(`${phase}: ${completed}/${total} (${skipped} skipped)`);
},
});
console.log(`Reindexed ${result.reindexed}, skipped ${result.skipped} in ${result.durationMs}ms`);How Text is Found
By default, reindexCollection() looks for source text in document metadata using these fields (in order):
_text, text, content, body, __text, pageContent
Documents without text in any of these fields are skipped (not re-embedded). The skip count is reported in the result and progress events.
Custom Text Extraction
If your documents store text in a non-standard field:
await reindexCollection(db, newModel, {
textExtractor: (metadata) => {
if (typeof metadata.rawContent === 'string') {
return metadata.rawContent;
}
return null; // Skip this document
},
});ReindexOptions
| Option | Type | Default | Description |
|---|---|---|---|
abortSignal | AbortSignal | -- | Cancel the operation |
onProgress | (progress: ReindexProgress) => void | -- | Progress callback |
queue | InferenceQueue | -- | Background scheduling via inference queue |
batchSize | number | 50 | Documents per embedding batch |
textExtractor | (metadata) => string | null | -- | Custom text extraction |
textField | string | '_text' | Primary metadata field for text |
ReindexResult
| Field | Type | Description |
|---|---|---|
reindexed | number | Documents successfully re-embedded |
skipped | number | Documents skipped (no text found) |
durationMs | number | Total duration in milliseconds |
Resumability
If a reindex operation is interrupted (tab closed, abort, crash), the progress cursor is persisted in the meta store. The next call to reindexCollection() with the same target model automatically resumes from where it left off.
A stale cursor (from a different target model) is discarded, and reindex starts fresh.
Inference Queue Integration
Submit reindex batches at 'background' priority so interactive operations are not blocked:
import { createInferenceQueue, reindexCollection } from '@localmode/core';
const queue = createInferenceQueue({ concurrency: 1 });
await reindexCollection(db, newModel, {
queue, // Each batch runs at 'background' priority
});Cross-Tab Safety
reindexCollection() acquires an exclusive write lock via the Web Locks API (reindex_{collectionId}). Only one tab can reindex a collection at a time. Other tabs wait for the lock to be released.
Events
Subscribe to drift detection and reindex lifecycle events via globalEventBus:
import { globalEventBus } from '@localmode/core';
globalEventBus.on('modelDriftDetected', ({ collection, storedModel, currentModel, documentCount }) => {
console.warn(`Model drift in "${collection}": ${storedModel.modelId} -> ${currentModel.modelId}`);
});
globalEventBus.on('reindexStart', ({ collection, total, resumed }) => {
console.log(`Reindex started: ${total} docs${resumed ? ' (resumed)' : ''}`);
});
globalEventBus.on('reindexProgress', ({ collection, completed, total, skipped, phase }) => {
console.log(`${phase}: ${completed}/${total}`);
});
globalEventBus.on('reindexComplete', ({ collection, reindexed, skipped, durationMs }) => {
console.log(`Done: ${reindexed} reindexed, ${skipped} skipped in ${durationMs}ms`);
});React Hook
The useReindex hook from @localmode/react wraps reindexCollection() with React state:
import { useReindex } from '@localmode/react';
function ReindexPanel({ db, newModel }) {
const { isReindexing, progress, error, reindex, cancel, clearError } = useReindex({
db,
model: newModel,
});
return (
<div>
{isReindexing && progress && (
<progress value={progress.completed} max={progress.total} />
)}
<button onClick={reindex} disabled={isReindexing}>Start Reindex</button>
<button onClick={cancel} disabled={!isReindexing}>Cancel</button>
{error && <p>{error.message} <button onClick={clearError}>Dismiss</button></p>}
</div>
);
}UseReindexReturn
| Field | Type | Description |
|---|---|---|
isReindexing | boolean | Whether reindexing is in progress |
progress | ReindexProgress | null | Current progress |
error | { message: string } | null | Error if failed |
reindex | () => Promise<ReindexResult | null> | Start reindexing |
cancel | () => void | Cancel the operation |
clearError | () => void | Clear error state |
Backward Compatibility
- Collections created without a
modeloption have no stored fingerprint and work exactly as before. checkModelCompatibility()returnsstatus: 'compatible'withstoredModel: nullfor collections without a fingerprint.- No breaking changes to existing APIs.
Helper Functions
import { extractFingerprint, fingerprintsMatch } from '@localmode/core';
// Derive fingerprint from any EmbeddingModel
const fp = extractFingerprint(model);
// Compare two fingerprints
if (!fingerprintsMatch(storedFp, currentFp)) {
console.log('Models differ');
}