Threshold Calibration

Automatically determine optimal similarity thresholds for search and use per-model presets.

Overview

Different embedding models produce different similarity score distributions. A cosine similarity of 0.7 might represent a strong match for one model but a weak match for another. Choosing the right threshold for db.search() or semanticSearch() is critical for relevance filtering.

See it in action

Try Model Evaluator and Product Search for working demos of these APIs.

LocalMode provides two complementary features:

calibrateThreshold() -- Empirically calibrates a threshold from your actual corpus data
MODEL_THRESHOLD_PRESETS -- Known-good defaults for popular models when you need an instant answer

Both are entirely optional. Existing search behavior is unchanged when no threshold is provided.

Quick Start

import { calibrateThreshold, semanticSearch } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.embedding('Xenova/bge-small-en-v1.5');
const corpus = ['document 1 text...', 'document 2 text...', /* ... */];

const { threshold } = await calibrateThreshold({
  model,
  corpus,
  percentile: 90, // Filter below 90th percentile of similarity
});

// Use the calibrated threshold for search
const results = await semanticSearch({
  db,
  model,
  query: 'How to configure auth?',
  threshold,
});

import { getDefaultThreshold, semanticSearch } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.embedding('Xenova/bge-small-en-v1.5');
const threshold = getDefaultThreshold('Xenova/bge-small-en-v1.5');
// 0.5

const results = await semanticSearch({
  db,
  model,
  query: 'How to configure auth?',
  threshold, // undefined-safe: omitted if model not in presets
});

calibrateThreshold()

Embeds a corpus sample, computes all pairwise similarity scores, and returns the score at a configurable percentile.

import { calibrateThreshold } from '@localmode/core';

const calibration = await calibrateThreshold({
  model,
  corpus: sampleTexts,
  percentile: 90,
  maxSamples: 200,
});

console.log(calibration.threshold);              // 0.52
console.log(calibration.distribution.mean);       // 0.38
console.log(calibration.distribution.stdDev);     // 0.12
console.log(calibration.sampleSize);              // 200

CalibrateThresholdOptions

Prop

Type

ThresholdCalibration (Result)

Prop

Type

ThresholdDistributionStats

Prop

Type

Percentile Selection

The percentile parameter controls threshold strictness:

Percentile	Behavior	Use Case
70-80	Permissive, more results	Exploratory search, broad recall
90 (default)	Balanced	General semantic search
95-99	Strict, fewer but more precise results	High-precision applications

The threshold is computed using the nearest-rank method: index = ceil(percentile / 100 * count) - 1, clamped to [0, count - 1].

MODEL_THRESHOLD_PRESETS

A static map of known-good cosine similarity thresholds for popular models:

import { MODEL_THRESHOLD_PRESETS } from '@localmode/core';

console.log(MODEL_THRESHOLD_PRESETS);
// {
//   'Xenova/bge-small-en-v1.5': 0.5,
//   'Xenova/bge-base-en-v1.5': 0.5,
//   'Xenova/all-MiniLM-L6-v2': 0.68,
//   'Xenova/all-MiniLM-L12-v2': 0.7,
//   'nomic-ai/nomic-embed-text-v1.5': 0.55,
//   'Xenova/gte-small': 0.6,
//   'Xenova/gte-base': 0.6,
//   'Xenova/e5-small-v2': 0.6,
//   'Xenova/paraphrase-MiniLM-L6-v2': 0.72,
// }

Presets are approximate defaults for cosine similarity. For production use with domain-specific data, use calibrateThreshold() for a data-driven threshold.

getDefaultThreshold()

Safe lookup that returns undefined for unknown models:

import { getDefaultThreshold } from '@localmode/core';

const threshold = getDefaultThreshold('Xenova/bge-small-en-v1.5');
// 0.5

const unknown = getDefaultThreshold('unknown/model');
// undefined

This is useful for conditional threshold application:

const threshold = getDefaultThreshold(model.modelId);

const results = await db.search(queryVector, {
  k: 10,
  ...(threshold !== undefined && { threshold }),
});

Distance Functions

By default, calibrateThreshold() uses cosine similarity. You can use other metrics:

// Default -- cosine similarity scores in [-1, 1]
const { threshold } = await calibrateThreshold({
  model,
  corpus,
  distanceFunction: 'cosine',
});

// Euclidean -- scores computed as 1 / (1 + distance), in (0, 1]
const { threshold } = await calibrateThreshold({
  model,
  corpus,
  distanceFunction: 'euclidean',
});

// Dot product -- raw dot product scores (any real number)
const { threshold } = await calibrateThreshold({
  model,
  corpus,
  distanceFunction: 'dot',
});

AbortSignal Support

Calibration supports cancellation via AbortSignal:

const controller = new AbortController();
setTimeout(() => controller.abort(), 10000); // 10s timeout

try {
  const { threshold } = await calibrateThreshold({
    model,
    corpus,
    abortSignal: controller.signal,
  });
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('Calibration cancelled');
  }
}

React Hook

The useCalibrateThreshold() hook from @localmode/react wraps calibrateThreshold() with React state management:

import { useCalibrateThreshold } from '@localmode/react';

function ThresholdCalibrator({ model, corpus }) {
  const {
    calibration,
    isCalibrating,
    error,
    calibrate,
    cancel,
    clearError,
  } = useCalibrateThreshold({ model, percentile: 90 });

  return (
    <div>
      <button onClick={() => calibrate(corpus)} disabled={isCalibrating}>
        {isCalibrating ? 'Calibrating...' : 'Calibrate Threshold'}
      </button>
      {isCalibrating && <button onClick={cancel}>Cancel</button>}
      {calibration && (
        <div>
          <p>Threshold: {calibration.threshold.toFixed(4)}</p>
          <p>Mean: {calibration.distribution.mean.toFixed(4)}</p>
          <p>Samples: {calibration.sampleSize}</p>
        </div>
      )}
      {error && <p>Error: {error.message}</p>}
    </div>
  );
}

Performance

calibrateThreshold() computes O(n^2) pairwise similarities, capped by maxSamples:

Samples	Pairs	Pairwise Time	Total (with embedding)
50	1,225	~1ms	~1-3s
100	4,950	~2ms	~2-5s
200 (default)	19,900	~5ms	~3-10s
500	124,750	~30ms	~5-20s

The embedding step dominates runtime. The pairwise computation is negligible for the default maxSamples of 200.

Integration with db.search()

The calibrated threshold integrates directly with the existing search API:

import { calibrateThreshold, createVectorDB, semanticSearch } from '@localmode/core';

// 1. Calibrate once at initialization
const { threshold } = await calibrateThreshold({ model, corpus });

// 2. Use with db.search()
const results = await db.search(queryVector, {
  k: 10,
  threshold, // Only results above this score are returned
});

// 3. Or with semanticSearch()
const { results: semanticResults } = await semanticSearch({
  db,
  model,
  query: 'my search query',
  threshold,
});

calibrateThreshold() is purely additive. When no threshold is passed to db.search(), all top-k results are returned as before.

Showcase Apps

App	Description	Links
Model Evaluator	Calibrate similarity thresholds from labeled data	Demo · Source
Product Search	Threshold tuning for product similarity matching	Demo · Source

Threshold Calibration

On this page