Transformers
Embeddings
Generate text embeddings for semantic search and similarity.
Generate dense vector representations of text for semantic search, clustering, and similarity matching.
For full API reference (embed(), embedMany(), streamEmbedMany(), options, result types, middleware, and custom providers), see the Core Embeddings guide.
See it in action
Try Semantic Search and PDF Search for working demos.
Model Configuration
import { transformers } from '@localmode/transformers';
const model = transformers.embedding('Xenova/bge-small-en-v1.5', {
quantized: true, // Use quantized model (default: true)
device: 'webgpu', // 'webgpu' or 'wasm' (auto-detected if omitted)
onProgress: (p) => {
console.log(`Loading model: ${(p.progress * 100).toFixed(1)}%`);
},
});Device Selection
Choose between WebGPU (faster, requires hardware support) and WASM (universal fallback):
import { transformers, isWebGPUAvailable } from '@localmode/transformers';
const device = (await isWebGPUAvailable()) ? 'webgpu' : 'wasm';
const model = transformers.embedding('Xenova/bge-small-en-v1.5', {
device,
quantized: true,
});| Device | Performance | Compatibility | When to Use |
|---|---|---|---|
'webgpu' | Fastest (GPU-accelerated) | Chrome 113+, Edge 113+, Safari 18+ | Default when available |
'wasm' | Good (CPU) | All modern browsers | Universal fallback |
| (omitted) | Auto-detected | - | Recommended for most apps |
Recommended Models
| Model | Dimensions | Size | Speed | Use Case |
|---|---|---|---|---|
Snowflake/snowflake-arctic-embed-xs | 384 | 23MB | ⚡⚡⚡ | Tiny, best retrieval for size |
Xenova/bge-small-en-v1.5 | 384 | 33MB | ⚡⚡⚡ | General purpose, recommended |
Xenova/bge-base-en-v1.5 | 768 | 110MB | ⚡⚡ | Higher quality |
Xenova/paraphrase-multilingual-MiniLM-L12-v2 | 384 | 117MB | ⚡⚡ | 50+ languages |
Multilingual Embeddings
For multilingual applications:
const model = transformers.embedding('Xenova/paraphrase-multilingual-MiniLM-L12-v2');
const { embeddings } = await embedMany({
model,
values: [
'Hello world', // English
'Bonjour le monde', // French
'Hola mundo', // Spanish
'こんにちは世界', // Japanese
'مرحبا بالعالم', // Arabic
],
});
// All embeddings are in the same vector space
// Cross-lingual similarity works!Comparison: Model Quality vs Speed
import { cosineSimilarity } from '@localmode/core';
// Test sentences
const s1 = 'The cat sits on the mat';
const s2 = 'A feline rests on a rug';
const s3 = 'The stock market crashed yesterday';
// Fast model
const fastModel = transformers.embedding('Xenova/bge-small-en-v1.5');
const { embeddings: fastEmbeddings } = await embedMany({
model: fastModel,
values: [s1, s2, s3],
});
// Quality model
const qualityModel = transformers.embedding('Xenova/all-mpnet-base-v2');
const { embeddings: qualityEmbeddings } = await embedMany({
model: qualityModel,
values: [s1, s2, s3],
});
// Compare similarities
console.log('Fast model:');
console.log(' s1-s2:', cosineSimilarity(fastEmbeddings[0], fastEmbeddings[1]).toFixed(3));
console.log(' s1-s3:', cosineSimilarity(fastEmbeddings[0], fastEmbeddings[2]).toFixed(3));
console.log('Quality model:');
console.log(' s1-s2:', cosineSimilarity(qualityEmbeddings[0], qualityEmbeddings[1]).toFixed(3));
console.log(' s1-s3:', cosineSimilarity(qualityEmbeddings[0], qualityEmbeddings[2]).toFixed(3));Caching Embeddings
Use caching middleware to avoid recomputation:
import { wrapEmbeddingModel, cachingMiddleware } from '@localmode/core';
const baseModel = transformers.embedding('Xenova/bge-small-en-v1.5');
const model = wrapEmbeddingModel(baseModel, [
cachingMiddleware({
maxSize: 10000,
storage: 'indexeddb',
dbName: 'embedding-cache',
}),
]);
// First call computes embedding
const { embedding: e1 } = await embed({ model, value: 'Hello' });
// Second call returns from cache (instant)
const { embedding: e2 } = await embed({ model, value: 'Hello' });Best Practices
Embedding Tips
- Match dimensions — Ensure your vector DB dimensions match the model
- Batch when possible —
embedMany()is more efficient than multipleembed()calls - Cache embeddings — Use caching middleware for repeated queries
- Normalize if needed — Some models benefit from L2 normalization
- Choose model wisely — Balance quality vs speed for your use case
Showcase Apps
| App | Description | Links |
|---|---|---|
| Semantic Search | Generate embeddings for document search | Demo · Source |
| Product Search | Embed product descriptions for catalog search | Demo · Source |
| PDF Search | Embed PDF chunks for document retrieval | Demo · Source |