Differential Privacy

Add mathematical privacy guarantees to embeddings and classification with calibrated noise mechanisms.

Differential privacy (DP) adds calibrated mathematical noise to outputs so that no single input can be identified from the result. LocalMode provides DP for both embeddings and classification, using cryptographically secure noise and privacy budget tracking.

See it in action

Try Document Redactor for a working demo of these APIs.

Threat Model

Embedding vectors encode semantic meaning of their inputs. Research has demonstrated that embedding inversion attacks can reconstruct original text from vectors with up to 92% fidelity. This means even if raw text is never stored, the vectors in IndexedDB expose user data.

Differential privacy mitigates this by adding noise calibrated to a privacy parameter epsilon. The guarantee: for any two inputs that differ by one record, the probability of producing any particular output changes by at most a factor of e^epsilon. Lower epsilon means stronger privacy (more noise), higher epsilon means weaker privacy (less noise).

Complementary with PII Redaction

DP noise is probabilistic protection on the output vectors. PII redaction is deterministic protection on the input text. For maximum security, combine both via composeEmbeddingMiddleware().

Noise Mechanisms

Two noise mechanisms are available, each with different privacy guarantees:

Mechanism	Guarantee	Calibration	Best For
Gaussian (default)	(epsilon, delta)-DP	sigma = (sensitivity * sqrt(2 * ln(1.25 / delta))) / epsilon	General use, high-dimensional embeddings
Laplacian	Pure epsilon-DP (no delta)	scale = sensitivity / epsilon	When delta=0 is required

Both mechanisms use crypto.getRandomValues() for cryptographically secure randomness. Math.random() is never used.

Direct Noise Generation

Generate noise vectors directly for custom use cases:

import { gaussianNoise, laplacianNoise, addNoise } from '@localmode/core';

// Gaussian noise: 384-dimensional, sigma = 0.1
const gNoise = gaussianNoise(384, 0.1);

// Laplacian noise: 384-dimensional, scale = 0.5
const lNoise = laplacianNoise(384, 0.5);

// Add noise to an embedding (element-wise addition)
const noisyEmbedding = addNoise(originalEmbedding, gNoise);

Box-Muller Transform

Gaussian noise is generated via the Box-Muller transform: given U1, U2 ~ Uniform(0,1), Z = sqrt(-2 * ln(U1)) * cos(2 * pi * U2) produces Z ~ Normal(0,1). Computation uses Float64 precision; results are stored as Float32.

dpEmbeddingMiddleware()

The primary way to use DP is through the embedding middleware, which integrates with wrapEmbeddingModel():

import { embed, wrapEmbeddingModel, dpEmbeddingMiddleware } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const baseModel = transformers.embedding('Xenova/all-MiniLM-L6-v2');

const privateModel = wrapEmbeddingModel({
  model: baseModel,
  middleware: dpEmbeddingMiddleware({
    epsilon: 1.0,
    delta: 1e-5,
    mechanism: 'gaussian',
  }),
});

// Embeddings now have calibrated Gaussian noise
const { embedding } = await embed({
  model: privateModel,
  value: 'sensitive medical record',
});

const privateModel = wrapEmbeddingModel({
  model: baseModel,
  middleware: dpEmbeddingMiddleware({
    epsilon: 1.0,
    mechanism: 'laplacian', // Pure epsilon-DP, no delta needed
  }),
});

const budget = await createPrivacyBudget({
  maxEpsilon: 10.0,
  persistKey: 'medical-notes',
  onExhausted: 'block',
});

const privateModel = wrapEmbeddingModel({
  model: baseModel,
  middleware: dpEmbeddingMiddleware({ epsilon: 1.0 }, budget),
});

// Each call consumes 1.0 epsilon
await embed({ model: privateModel, value: 'record 1' }); // 9.0 remaining
await embed({ model: privateModel, value: 'record 2' }); // 8.0 remaining

DPEmbeddingConfig

Prop

Type

The second argument to dpEmbeddingMiddleware() is an optional PrivacyBudget instance. When provided, budget.consume(epsilon) is called before each embed operation.

Privacy Budget

Sequential composition states that total privacy loss is the sum of per-query epsilons. The privacy budget tracker enforces an upper bound on cumulative epsilon across operations.

createPrivacyBudget()

import { createPrivacyBudget } from '@localmode/core';

const budget = await createPrivacyBudget({
  maxEpsilon: 10.0,
  persistKey: 'my-app-budget',
  onExhausted: 'block',
});

// Consume epsilon manually or via middleware
budget.consume(1.0);
console.log(budget.remaining());   // 9.0
console.log(budget.consumed());    // 1.0
console.log(budget.isExhausted()); // false

// Reset when privacy period expires
budget.reset();

// Clean up persisted state
await budget.destroy();

PrivacyBudgetConfig

Prop

Type

PrivacyBudget Interface

Prop

Type

Exhaustion Policies

const budget = await createPrivacyBudget({
  maxEpsilon: 5.0,
  onExhausted: 'warn',
});

// After budget is spent, operations continue with a console.warn()
budget.consume(5.0);
budget.consume(1.0); // Logs: "[LocalMode] Privacy budget exhausted: consumed 6.00 of 5.00 epsilon"
// Operation still proceeds

import { PrivacyBudgetExhaustedError } from '@localmode/core';

const budget = await createPrivacyBudget({
  maxEpsilon: 5.0,
  onExhausted: 'block',
});

budget.consume(5.0);

try {
  budget.consume(1.0); // Throws!
} catch (error) {
  if (error instanceof PrivacyBudgetExhaustedError) {
    console.log(error.maxEpsilon);      // 5.0
    console.log(error.consumedEpsilon); // 5.0 (the failed consume is rolled back)
    // Handle: reset budget, switch to non-DP mode, or show UI notification
  }
}

Persistence

When persistKey is provided, budget state is stored in IndexedDB and restored across browser sessions:

// Session 1
const budget = await createPrivacyBudget({
  maxEpsilon: 10.0,
  persistKey: 'user-vectors',
});
budget.consume(3.0);
// Page closes — state is persisted

// Session 2
const budget2 = await createPrivacyBudget({
  maxEpsilon: 10.0,
  persistKey: 'user-vectors',
});
console.log(budget2.consumed());  // 3.0 (restored from IndexedDB)
console.log(budget2.remaining()); // 7.0

When persistKey is omitted, the budget is tracked in memory only and resets on page reload.

Sensitivity Calibration

Sensitivity is the maximum change in embedding output (L2 norm) when a single input changes. It determines how much noise is needed for a given epsilon.

Lookup Table

For known models, getSensitivity() returns the pre-computed sensitivity:

import { getSensitivity } from '@localmode/core';

getSensitivity('Xenova/bge-small-en-v1.5');  // 2.0
getSensitivity('Xenova/all-MiniLM-L6-v2');   // 2.0
getSensitivity('unknown-model');              // 2.0 (default for normalized models)

All models that produce unit-normalized embeddings (L2 norm = 1) have a theoretical sensitivity bound of 2.0 -- the maximum L2 distance between two unit vectors.

Runtime Calibration

For models not in the lookup table, estimate sensitivity empirically:

import { calibrateSensitivity } from '@localmode/core';

const sensitivity = await calibrateSensitivity(model);
console.log(`Empirical sensitivity: ${sensitivity}`);

// With custom samples
const sensitivity2 = await calibrateSensitivity(model, [
  'First sample text',
  'Second sample text',
  'Third very different text',
]);

calibrateSensitivity() embeds a diverse set of texts (10 built-in samples or custom ones), computes all pairwise L2 distances, and returns the maximum distance with a 10% safety margin. If all embeddings are unit-normalized, it short-circuits and returns 2.0.

Auto Sensitivity in Middleware

When sensitivity is set to 'auto' (the default), dpEmbeddingMiddleware resolves sensitivity in this order:

Explicit modelId in config -- looked up in the known sensitivities table
Wrapped model's modelId -- looked up in the known sensitivities table
Default 2.0 -- safe upper bound for normalized embeddings

// Auto-detect from the wrapped model's ID
const privateModel = wrapEmbeddingModel({
  model: transformers.embedding('Xenova/bge-small-en-v1.5'),
  middleware: dpEmbeddingMiddleware({ epsilon: 1.0 }), // sensitivity auto-resolved to 2.0
});

// Override with explicit sensitivity
const customModel = wrapEmbeddingModel({
  model: myCustomModel,
  middleware: dpEmbeddingMiddleware({
    epsilon: 1.0,
    sensitivity: 1.5, // Explicit value from your own calibration
  }),
});

Classification DP (Randomized Response)

For classification outputs, DP uses randomized response instead of continuous noise. Given k possible labels, the true label is returned with probability p = e^epsilon / (e^epsilon + k - 1), and each other label with probability 1 / (e^epsilon + k - 1).

randomizedResponse()

import { randomizedResponse } from '@localmode/core';

// High epsilon = low privacy, high utility (true label almost always returned)
const label1 = randomizedResponse(
  'positive',                              // true label
  ['positive', 'negative', 'neutral'],     // all possible labels
  10.0                                     // epsilon
);
// label1 is almost certainly 'positive'

// Low epsilon = high privacy, low utility (label is nearly uniform random)
const label2 = randomizedResponse(
  'positive',
  ['positive', 'negative', 'neutral'],
  0.5
);
// label2 could be any of the three labels

dpClassificationMiddleware()

Apply randomized response as middleware on classification models:

import { dpClassificationMiddleware } from '@localmode/core';

const middleware = dpClassificationMiddleware({
  epsilon: 2.0,
  labels: ['positive', 'negative', 'neutral'],
});

When the randomized response flips a label, the score is set to 1 / labels.length (uniform prior) and allScores is cleared to prevent leaking the original distribution. When the label is preserved, scores remain unchanged.

DPClassificationConfig

Prop

Type

Parameter Tuning Guide

Choosing epsilon is a tradeoff between privacy and utility. Here is a practical guide:

Epsilon Ranges

Epsilon	Privacy Level	Noise Impact	Use Case
0.1 - 0.5	Strong	High noise, significant recall loss	Highly sensitive data (medical, legal)
1.0 - 3.0	Moderate	Moderate noise, less than 15% recall degradation	General privacy-sensitive applications
3.0 - 10.0	Weak	Low noise, minimal recall impact	Compliance or audit requirements

Epsilon vs Recall Tradeoff

At epsilon 1.0 with Gaussian mechanism and 384-dimensional embeddings:

Sigma is approximately 0.1 for normalized embeddings (sensitivity=2.0, delta=1e-5)
Top-10 recall typically drops less than 10%
Cosine similarity ordering is largely preserved for well-separated clusters

At epsilon 0.5, noise doubles and recall may drop 15-25%. At epsilon 3.0, noise is minimal and recall loss is typically under 5%.

Start with epsilon=1.0

A good starting point is epsilon=1.0 with the Gaussian mechanism. Measure your search recall on a test set, then adjust: decrease epsilon if privacy is paramount, increase if utility is too degraded.

Delta Guidelines

Delta represents the probability that the (epsilon, delta)-DP guarantee fails. Standard practice:

Set delta < 1/n where n is the number of records in your dataset
The default 1e-5 is suitable for datasets up to ~100,000 records
For larger datasets, consider 1e-7 or smaller

Composing with Other Middleware

For maximum privacy, compose PII redaction (deterministic, on input text) with DP noise (probabilistic, on output vectors):

import {
  wrapEmbeddingModel,
  composeEmbeddingMiddleware,
  piiRedactionMiddleware,
  dpEmbeddingMiddleware,
} from '@localmode/core';

const secureModel = wrapEmbeddingModel({
  model: baseModel,
  middleware: composeEmbeddingMiddleware([
    piiRedactionMiddleware({ patterns: ['email', 'phone', 'ssn'] }),
    dpEmbeddingMiddleware({ epsilon: 1.0 }),
  ]),
});

// Input text is PII-redacted, then the embedding has calibrated noise added
const { embedding } = await embed({
  model: secureModel,
  value: 'Patient John Doe, email john@example.com, diagnosed with...',
});

Middleware Order

PII redaction uses transformParams to modify input text before embedding. DP noise uses wrapEmbed to modify output vectors after embedding. Order in composeEmbeddingMiddleware ensures both apply correctly.

Full Secure Pipeline with Budget

import {
  wrapEmbeddingModel,
  composeEmbeddingMiddleware,
  piiRedactionMiddleware,
  dpEmbeddingMiddleware,
  createPrivacyBudget,
  createVectorDB,
  embed,
} from '@localmode/core';
import { transformers } from '@localmode/transformers';

// 1. Create a privacy budget that persists across sessions
const budget = await createPrivacyBudget({
  maxEpsilon: 50.0,
  persistKey: 'patient-records',
  onExhausted: 'block',
});

// 2. Wrap the model with PII redaction + DP noise
const secureModel = wrapEmbeddingModel({
  model: transformers.embedding('Xenova/bge-small-en-v1.5'),
  middleware: composeEmbeddingMiddleware([
    piiRedactionMiddleware({ patterns: ['email', 'phone', 'ssn'] }),
    dpEmbeddingMiddleware({ epsilon: 1.0, mechanism: 'gaussian' }, budget),
  ]),
});

// 3. Create a vector database
const db = await createVectorDB({ name: 'records', dimensions: 384 });

// 4. Embed and store — budget is consumed automatically
const { embedding } = await embed({
  model: secureModel,
  value: 'Patient record: diagnosed with condition X',
});

await db.add({ id: 'rec-1', vector: embedding, metadata: { type: 'diagnosis' } });

console.log(`Budget remaining: ${budget.remaining()}`); // 49.0

Error Handling

PrivacyBudgetExhaustedError

Thrown when a budget with onExhausted: 'block' is exceeded:

import { PrivacyBudgetExhaustedError } from '@localmode/core';

try {
  budget.consume(1.0);
} catch (error) {
  if (error instanceof PrivacyBudgetExhaustedError) {
    console.log(error.code);            // 'PRIVACY_BUDGET_EXHAUSTED'
    console.log(error.maxEpsilon);      // Total budget
    console.log(error.consumedEpsilon); // Amount consumed before the failed call
    console.log(error.hint);            // Actionable guidance
  }
}

The failed consume() call is rolled back -- consumedEpsilon reflects the state before the blocked operation.

API Reference

Functions

Function	Signature	Description
`dpEmbeddingMiddleware`	`(config: DPEmbeddingConfig, budget?: PrivacyBudget) => EmbeddingModelMiddleware`	Create DP middleware for embedding models
`dpClassificationMiddleware`	`(config: DPClassificationConfig) => ClassificationModelMiddleware`	Create DP middleware for classification models
`createPrivacyBudget`	`(config: PrivacyBudgetConfig) => Promise<PrivacyBudget>`	Create a privacy budget tracker
`randomizedResponse`	`(trueLabel: string, allLabels: string[], epsilon: number) => string`	Apply randomized response to a label
`gaussianNoise`	`(dimensions: number, sigma: number) => Float32Array`	Generate Gaussian noise vector
`laplacianNoise`	`(dimensions: number, scale: number) => Float32Array`	Generate Laplacian noise vector
`addNoise`	`(embedding: Float32Array, noise: Float32Array) => Float32Array`	Add noise to an embedding (element-wise)
`getSensitivity`	`(modelId?: string) => number`	Look up model sensitivity (returns 2.0 for unknown models)
`calibrateSensitivity`	`(model: EmbeddingModel, samples?: string[]) => Promise<number>`	Estimate sensitivity empirically from sample embeddings
`computeGaussianSigma`	`(sensitivity: number, epsilon: number, delta: number) => number`	Compute Gaussian noise sigma from DP parameters
`computeLaplacianScale`	`(sensitivity: number, epsilon: number) => number`	Compute Laplacian noise scale from DP parameters

Types

Type	Description
`DPEmbeddingConfig`	Configuration for DP embedding middleware
`DPClassificationConfig`	Configuration for DP classification middleware
`PrivacyBudgetConfig`	Configuration for privacy budget creation
`PrivacyBudget`	Privacy budget tracker interface

App	Description	Links
Document Redactor	Apply differential privacy noise to NER + embeddings	Demo · Source

Differential Privacy

Threat Model

Noise Mechanisms

Direct Noise Generation

dpEmbeddingMiddleware()

DPEmbeddingConfig

Privacy Budget

createPrivacyBudget()

PrivacyBudgetConfig

PrivacyBudget Interface

Exhaustion Policies

Persistence

Sensitivity Calibration

Lookup Table

Runtime Calibration

Auto Sensitivity in Middleware

Classification DP (Randomized Response)

randomizedResponse()

dpClassificationMiddleware()

DPClassificationConfig

Parameter Tuning Guide

Epsilon Ranges

Epsilon vs Recall Tradeoff

Delta Guidelines

Composing with Other Middleware

Full Secure Pipeline with Budget

Error Handling

PrivacyBudgetExhaustedError

API Reference

Functions

Types

Next Steps

Security

Middleware

Embeddings

Showcase Apps

On this page