LocalMode
Core

Embedding Drift Detection

Detect when an embedding model changes and re-embed documents to maintain search quality.

Overview

When you switch embedding models (e.g., from MiniLM-L6 to BGE-small-en), all existing vectors become incompatible -- even if both models produce the same dimensionality. Cosine similarity between vectors from different models produces nonsensical scores.

See it in action

Try Semantic Search for a working demo of these APIs.

LocalMode tracks model provenance per collection via ModelFingerprint. On initialization, the stored fingerprint is compared against the current model. If they differ, a modelDriftDetected event fires so you can take action.

ModelFingerprint

A ModelFingerprint captures the identity of the embedding model that produced a collection's vectors:

interface ModelFingerprint {
  modelId: string;   // e.g., 'Xenova/bge-small-en-v1.5'
  provider: string;  // e.g., 'transformers'
  dimensions: number; // e.g., 384
}

The fingerprint is derived automatically from EmbeddingModel.modelId, EmbeddingModel.provider, and EmbeddingModel.dimensions -- no changes to the model interface are needed.

Enabling Drift Detection

Pass your embedding model when creating a VectorDB:

import { createVectorDB } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.embedding('Xenova/bge-small-en-v1.5');

const db = await createVectorDB({
  name: 'my-docs',
  dimensions: 384,
  model, // Enables drift detection
});

On first use, the model's fingerprint is stored with the collection. On subsequent initializations, the stored fingerprint is compared to the current model.

Checking Compatibility

Use checkModelCompatibility() for a read-only check that does not modify storage or emit events:

import { checkModelCompatibility } from '@localmode/core';

const result = await checkModelCompatibility(db, newModel);

switch (result.status) {
  case 'compatible':
    console.log('Models match -- no action needed');
    break;
  case 'incompatible':
    console.log(`Model changed: ${result.storedModel?.modelId} -> ${result.currentModel.modelId}`);
    console.log(`${result.documentCount} documents may need re-embedding`);
    break;
  case 'dimension-mismatch':
    console.log('Dimension mismatch -- cannot use this model with existing data');
    break;
}

ModelCompatibilityResult

FieldTypeDescription
status'compatible' | 'incompatible' | 'dimension-mismatch'Compatibility status
storedModelModelFingerprint | nullStored fingerprint (null if none stored)
currentModelModelFingerprintCurrent model's fingerprint
documentCountnumberNumber of documents in the collection

Reindexing

When drift is detected, use reindexCollection() to re-embed all documents with the new model:

import { reindexCollection } from '@localmode/core';

const result = await reindexCollection(db, newModel, {
  batchSize: 32,
  onProgress: ({ completed, total, skipped, phase }) => {
    console.log(`${phase}: ${completed}/${total} (${skipped} skipped)`);
  },
});

console.log(`Reindexed ${result.reindexed}, skipped ${result.skipped} in ${result.durationMs}ms`);

How Text is Found

By default, reindexCollection() looks for source text in document metadata using these fields (in order):

_text, text, content, body, __text, pageContent

Documents without text in any of these fields are skipped (not re-embedded). The skip count is reported in the result and progress events.

Custom Text Extraction

If your documents store text in a non-standard field:

await reindexCollection(db, newModel, {
  textExtractor: (metadata) => {
    if (typeof metadata.rawContent === 'string') {
      return metadata.rawContent;
    }
    return null; // Skip this document
  },
});

ReindexOptions

OptionTypeDefaultDescription
abortSignalAbortSignal--Cancel the operation
onProgress(progress: ReindexProgress) => void--Progress callback
queueInferenceQueue--Background scheduling via inference queue
batchSizenumber50Documents per embedding batch
textExtractor(metadata) => string | null--Custom text extraction
textFieldstring'_text'Primary metadata field for text

ReindexResult

FieldTypeDescription
reindexednumberDocuments successfully re-embedded
skippednumberDocuments skipped (no text found)
durationMsnumberTotal duration in milliseconds

Resumability

If a reindex operation is interrupted (tab closed, abort, crash), the progress cursor is persisted in the meta store. The next call to reindexCollection() with the same target model automatically resumes from where it left off.

A stale cursor (from a different target model) is discarded, and reindex starts fresh.

Inference Queue Integration

Submit reindex batches at 'background' priority so interactive operations are not blocked:

import { createInferenceQueue, reindexCollection } from '@localmode/core';

const queue = createInferenceQueue({ concurrency: 1 });

await reindexCollection(db, newModel, {
  queue, // Each batch runs at 'background' priority
});

Cross-Tab Safety

reindexCollection() acquires an exclusive write lock via the Web Locks API (reindex_{collectionId}). Only one tab can reindex a collection at a time. Other tabs wait for the lock to be released.

Events

Subscribe to drift detection and reindex lifecycle events via globalEventBus:

import { globalEventBus } from '@localmode/core';

globalEventBus.on('modelDriftDetected', ({ collection, storedModel, currentModel, documentCount }) => {
  console.warn(`Model drift in "${collection}": ${storedModel.modelId} -> ${currentModel.modelId}`);
});

globalEventBus.on('reindexStart', ({ collection, total, resumed }) => {
  console.log(`Reindex started: ${total} docs${resumed ? ' (resumed)' : ''}`);
});

globalEventBus.on('reindexProgress', ({ collection, completed, total, skipped, phase }) => {
  console.log(`${phase}: ${completed}/${total}`);
});

globalEventBus.on('reindexComplete', ({ collection, reindexed, skipped, durationMs }) => {
  console.log(`Done: ${reindexed} reindexed, ${skipped} skipped in ${durationMs}ms`);
});

React Hook

The useReindex hook from @localmode/react wraps reindexCollection() with React state:

import { useReindex } from '@localmode/react';

function ReindexPanel({ db, newModel }) {
  const { isReindexing, progress, error, reindex, cancel, clearError } = useReindex({
    db,
    model: newModel,
  });

  return (
    <div>
      {isReindexing && progress && (
        <progress value={progress.completed} max={progress.total} />
      )}
      <button onClick={reindex} disabled={isReindexing}>Start Reindex</button>
      <button onClick={cancel} disabled={!isReindexing}>Cancel</button>
      {error && <p>{error.message} <button onClick={clearError}>Dismiss</button></p>}
    </div>
  );
}

UseReindexReturn

FieldTypeDescription
isReindexingbooleanWhether reindexing is in progress
progressReindexProgress | nullCurrent progress
error{ message: string } | nullError if failed
reindex() => Promise<ReindexResult | null>Start reindexing
cancel() => voidCancel the operation
clearError() => voidClear error state

Backward Compatibility

  • Collections created without a model option have no stored fingerprint and work exactly as before.
  • checkModelCompatibility() returns status: 'compatible' with storedModel: null for collections without a fingerprint.
  • No breaking changes to existing APIs.

Helper Functions

import { extractFingerprint, fingerprintsMatch } from '@localmode/core';

// Derive fingerprint from any EmbeddingModel
const fp = extractFingerprint(model);

// Compare two fingerprints
if (!fingerprintsMatch(storedFp, currentFp)) {
  console.log('Models differ');
}

Showcase Apps

AppDescriptionLinks
Semantic SearchDetect model drift and trigger reindexingDemo · Source

On this page