LocalMode
Transformers

Embeddings

Generate text embeddings for semantic search and similarity.

Generate dense vector representations of text for semantic search, clustering, and similarity matching.

Basic Usage

import { transformers } from '@localmode/transformers';
import { embed, embedMany } from '@localmode/core';

const model = transformers.embedding('Xenova/all-MiniLM-L6-v2');

// Single embedding
const { embedding } = await embed({
  model,
  value: 'Machine learning is fascinating',
});

console.log('Dimensions:', embedding.length); // 384

// Batch embeddings
const { embeddings } = await embedMany({
  model,
  values: ['Hello', 'World', 'AI'],
});
ModelDimensionsSizeSpeedUse Case
Xenova/all-MiniLM-L6-v238422MB⚡⚡⚡General purpose, fastest
Xenova/all-MiniLM-L12-v238433MB⚡⚡Better accuracy
Xenova/all-mpnet-base-v2768110MBHighest quality
Xenova/paraphrase-multilingual-MiniLM-L12-v2384117MB⚡⚡50+ languages
Xenova/e5-small-v238433MB⚡⚡⚡E5 family, fast
Xenova/bge-small-en-v1.538433MB⚡⚡⚡BGE family

With Vector Database

import { createVectorDB, embed, embedMany, semanticSearch } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.embedding('Xenova/all-MiniLM-L6-v2');
const db = await createVectorDB({ name: 'docs', dimensions: 384 });

// Index documents
const documents = [
  'Machine learning enables computers to learn from data',
  'Deep learning uses neural networks with many layers',
  'Natural language processing analyzes human language',
];

const { embeddings } = await embedMany({ model, values: documents });

await db.addMany(
  documents.map((text, i) => ({
    id: `doc-${i}`,
    vector: embeddings[i],
    metadata: { text },
  }))
);

// Search
const results = await semanticSearch({
  db,
  model,
  query: 'How do neural networks work?',
  k: 3,
});

Progress Tracking

const { embeddings } = await embedMany({
  model,
  values: largeDocumentArray,
  onProgress: (progress) => {
    const percent = (progress.completed / progress.total * 100).toFixed(1);
    console.log(`Embedding: ${percent}%`);
  },
});

Model Configuration

const model = transformers.embedding('Xenova/all-MiniLM-L6-v2', {
  quantized: true,     // Use quantized model (default: true)
  revision: 'main',    // Model revision
  progress: (p) => {
    console.log(`Loading model: ${(p.progress * 100).toFixed(1)}%`);
  },
});

Multilingual Embeddings

For multilingual applications:

const model = transformers.embedding('Xenova/paraphrase-multilingual-MiniLM-L12-v2');

const { embeddings } = await embedMany({
  model,
  values: [
    'Hello world',           // English
    'Bonjour le monde',      // French
    'Hola mundo',            // Spanish
    'こんにちは世界',          // Japanese
    'مرحبا بالعالم',         // Arabic
  ],
});

// All embeddings are in the same vector space
// Cross-lingual similarity works!

Comparison: Model Quality vs Speed

import { cosineSimilarity } from '@localmode/core';

// Test sentences
const s1 = 'The cat sits on the mat';
const s2 = 'A feline rests on a rug';
const s3 = 'The stock market crashed yesterday';

// Fast model
const fastModel = transformers.embedding('Xenova/all-MiniLM-L6-v2');
const { embeddings: fastEmbeddings } = await embedMany({
  model: fastModel,
  values: [s1, s2, s3],
});

// Quality model
const qualityModel = transformers.embedding('Xenova/all-mpnet-base-v2');
const { embeddings: qualityEmbeddings } = await embedMany({
  model: qualityModel,
  values: [s1, s2, s3],
});

// Compare similarities
console.log('Fast model:');
console.log('  s1-s2:', cosineSimilarity(fastEmbeddings[0], fastEmbeddings[1]).toFixed(3));
console.log('  s1-s3:', cosineSimilarity(fastEmbeddings[0], fastEmbeddings[2]).toFixed(3));

console.log('Quality model:');
console.log('  s1-s2:', cosineSimilarity(qualityEmbeddings[0], qualityEmbeddings[1]).toFixed(3));
console.log('  s1-s3:', cosineSimilarity(qualityEmbeddings[0], qualityEmbeddings[2]).toFixed(3));

Caching Embeddings

Use caching middleware to avoid recomputation:

import { wrapEmbeddingModel, cachingMiddleware } from '@localmode/core';

const baseModel = transformers.embedding('Xenova/all-MiniLM-L6-v2');

const model = wrapEmbeddingModel(baseModel, [
  cachingMiddleware({
    maxSize: 10000,
    storage: 'indexeddb',
    dbName: 'embedding-cache',
  }),
]);

// First call computes embedding
const { embedding: e1 } = await embed({ model, value: 'Hello' });

// Second call returns from cache (instant)
const { embedding: e2 } = await embed({ model, value: 'Hello' });

Best Practices

Embedding Tips

  1. Match dimensions — Ensure your vector DB dimensions match the model
  2. Batch when possibleembedMany() is more efficient than multiple embed() calls
  3. Cache embeddings — Use caching middleware for repeated queries
  4. Normalize if needed — Some models benefit from L2 normalization
  5. Choose model wisely — Balance quality vs speed for your use case

Next Steps

On this page