LocalMode
Transformers

Embeddings

Generate text embeddings for semantic search and similarity.

Generate dense vector representations of text for semantic search, clustering, and similarity matching.

For full API reference (embed(), embedMany(), streamEmbedMany(), options, result types, middleware, and custom providers), see the Core Embeddings guide.

See it in action

Try Semantic Search and PDF Search for working demos.

Model Configuration

import { transformers } from '@localmode/transformers';

const model = transformers.embedding('Xenova/bge-small-en-v1.5', {
  quantized: true,     // Use quantized model (default: true)
  device: 'webgpu',    // 'webgpu' or 'wasm' (auto-detected if omitted)
  onProgress: (p) => {
    console.log(`Loading model: ${(p.progress * 100).toFixed(1)}%`);
  },
});

Device Selection

Choose between WebGPU (faster, requires hardware support) and WASM (universal fallback):

import { transformers, isWebGPUAvailable } from '@localmode/transformers';

const device = (await isWebGPUAvailable()) ? 'webgpu' : 'wasm';

const model = transformers.embedding('Xenova/bge-small-en-v1.5', {
  device,
  quantized: true,
});
DevicePerformanceCompatibilityWhen to Use
'webgpu'Fastest (GPU-accelerated)Chrome 113+, Edge 113+, Safari 18+Default when available
'wasm'Good (CPU)All modern browsersUniversal fallback
(omitted)Auto-detected-Recommended for most apps
ModelDimensionsSizeSpeedUse Case
Snowflake/snowflake-arctic-embed-xs38423MB⚡⚡⚡Tiny, best retrieval for size
Xenova/bge-small-en-v1.538433MB⚡⚡⚡General purpose, recommended
Xenova/bge-base-en-v1.5768110MB⚡⚡Higher quality
Xenova/paraphrase-multilingual-MiniLM-L12-v2384117MB⚡⚡50+ languages

Multilingual Embeddings

For multilingual applications:

const model = transformers.embedding('Xenova/paraphrase-multilingual-MiniLM-L12-v2');

const { embeddings } = await embedMany({
  model,
  values: [
    'Hello world',           // English
    'Bonjour le monde',      // French
    'Hola mundo',            // Spanish
    'こんにちは世界',          // Japanese
    'مرحبا بالعالم',         // Arabic
  ],
});

// All embeddings are in the same vector space
// Cross-lingual similarity works!

Comparison: Model Quality vs Speed

import { cosineSimilarity } from '@localmode/core';

// Test sentences
const s1 = 'The cat sits on the mat';
const s2 = 'A feline rests on a rug';
const s3 = 'The stock market crashed yesterday';

// Fast model
const fastModel = transformers.embedding('Xenova/bge-small-en-v1.5');
const { embeddings: fastEmbeddings } = await embedMany({
  model: fastModel,
  values: [s1, s2, s3],
});

// Quality model
const qualityModel = transformers.embedding('Xenova/all-mpnet-base-v2');
const { embeddings: qualityEmbeddings } = await embedMany({
  model: qualityModel,
  values: [s1, s2, s3],
});

// Compare similarities
console.log('Fast model:');
console.log('  s1-s2:', cosineSimilarity(fastEmbeddings[0], fastEmbeddings[1]).toFixed(3));
console.log('  s1-s3:', cosineSimilarity(fastEmbeddings[0], fastEmbeddings[2]).toFixed(3));

console.log('Quality model:');
console.log('  s1-s2:', cosineSimilarity(qualityEmbeddings[0], qualityEmbeddings[1]).toFixed(3));
console.log('  s1-s3:', cosineSimilarity(qualityEmbeddings[0], qualityEmbeddings[2]).toFixed(3));

Caching Embeddings

Use caching middleware to avoid recomputation:

import { wrapEmbeddingModel, cachingMiddleware } from '@localmode/core';

const baseModel = transformers.embedding('Xenova/bge-small-en-v1.5');

const model = wrapEmbeddingModel(baseModel, [
  cachingMiddleware({
    maxSize: 10000,
    storage: 'indexeddb',
    dbName: 'embedding-cache',
  }),
]);

// First call computes embedding
const { embedding: e1 } = await embed({ model, value: 'Hello' });

// Second call returns from cache (instant)
const { embedding: e2 } = await embed({ model, value: 'Hello' });

Best Practices

Embedding Tips

  1. Match dimensions — Ensure your vector DB dimensions match the model
  2. Batch when possibleembedMany() is more efficient than multiple embed() calls
  3. Cache embeddings — Use caching middleware for repeated queries
  4. Normalize if needed — Some models benefit from L2 normalization
  5. Choose model wisely — Balance quality vs speed for your use case

Showcase Apps

AppDescriptionLinks
Semantic SearchGenerate embeddings for document searchDemo · Source
Product SearchEmbed product descriptions for catalog searchDemo · Source
PDF SearchEmbed PDF chunks for document retrievalDemo · Source

Next Steps

On this page