@localmode/transformers

HuggingFace Transformers.js provider for LocalMode. Run ML models locally in the browser with WebGPU/WASM acceleration.

Features

🚀 Browser-Native — Run ML models directly in the browser
🔒 Privacy-First — All processing happens locally
📦 Model Caching — Models cached in IndexedDB for instant subsequent loads
⚡ Optimized — Uses quantized models for smaller size and faster inference

Installation

bash pnpm install @localmode/transformers @localmode/core

bash npm install @localmode/transformers @localmode/core

bash yarn add @localmode/transformers @localmode/core

Quick Start

import { transformers } from '@localmode/transformers';
import { embed, rerank } from '@localmode/core';

// Text Embeddings
const embeddingModel = transformers.embedding('Xenova/all-MiniLM-L6-v2');
const { embedding } = await embed({ model: embeddingModel, value: 'Hello world' });

// Reranking for RAG
const rerankerModel = transformers.reranker('Xenova/ms-marco-MiniLM-L-6-v2');
const { results } = await rerank({
  model: rerankerModel,
  query: 'What is machine learning?',
  documents: ['ML is a subset of AI...', 'Python is a language...'],
  topK: 5,
});

✅ Live Features

These features are production-ready and fully documented.

Embeddings

Generate text embeddings for semantic search and RAG.

Reranking

Improve RAG accuracy with cross-encoder reranking.

Method	Interface	Description
`transformers.embedding(modelId)`	`EmbeddingModel`	Text embeddings
`transformers.reranker(modelId)`	`RerankerModel`	Document reranking

Recommended Models

🚧 Coming Soon

These features have interfaces defined and implementations available, but are under active development and testing. Full documentation will be added once they are production-ready.

The features listed below are not yet production-ready. APIs may change before stable release.

Classification & NLP

Feature	Method	Interface
Text Classification	`transformers.classifier(modelId)`	`ClassificationModel`
Zero-Shot Classification	`transformers.zeroShotClassifier(modelId)`	`ZeroShotClassificationModel`
Named Entity Recognition	`transformers.ner(modelId)`	`NERModel`

Translation & Text Processing

Feature	Method	Interface
Translation	`transformers.translator(modelId)`	`TranslationModel`
Summarization	`transformers.summarizer(modelId)`	`SummarizationModel`
Fill-Mask	`transformers.fillMask(modelId)`	`FillMaskModel`
Question Answering	`transformers.questionAnswering(modelId)`	`QuestionAnsweringModel`

Audio

Feature	Method	Interface
Speech-to-Text	`transformers.speechToText(modelId)`	`SpeechToTextModel`
Text-to-Speech	`transformers.textToSpeech(modelId)`	`TextToSpeechModel`

Vision

Feature	Method	Interface
Image Classification	`transformers.imageClassifier(modelId)`	`ImageClassificationModel`
Zero-Shot Image Classification	`transformers.zeroShotImageClassifier(modelId)`	`ZeroShotImageClassificationModel`
Image Captioning	`transformers.captioner(modelId)`	`ImageCaptionModel`
Image Segmentation	`transformers.segmenter(modelId)`	`SegmentationModel`
Object Detection	`transformers.objectDetector(modelId)`	`ObjectDetectionModel`
OCR	`transformers.ocr(modelId)`	`OCRModel`
Document QA	`transformers.documentQA(modelId)`	`DocumentQAModel`

Model Options

Configure model loading:

const model = transformers.embedding('Xenova/all-MiniLM-L6-v2', {
  quantized: true, // Use quantized model (smaller, faster)
  revision: 'main', // Model revision
  progress: (p) => {
    console.log(`Loading: ${(p.progress * 100).toFixed(1)}%`);
  },
});

Model Utilities

Manage model loading and caching:

import { preloadModel, isModelCached, getModelStorageUsage } from '@localmode/transformers';

// Check if model is cached
const cached = await isModelCached('Xenova/all-MiniLM-L6-v2');

// Preload model with progress
await preloadModel('Xenova/all-MiniLM-L6-v2', {
  onProgress: (p) => console.log(`${p.progress}% loaded`),
});

// Check storage usage
const usage = await getModelStorageUsage();

WebGPU Detection

Detect WebGPU availability for optimal device selection:

import { isWebGPUAvailable, getOptimalDevice } from '@localmode/transformers';

// Check if WebGPU is available
const webgpuAvailable = await isWebGPUAvailable();

if (webgpuAvailable) {
  console.log('WebGPU available, using GPU acceleration');
} else {
  console.log('Falling back to WASM');
}

// Get optimal device automatically
const device = await getOptimalDevice(); // 'webgpu' or 'wasm'

const model = transformers.embedding('Xenova/all-MiniLM-L6-v2', {
  device, // Uses WebGPU if available, otherwise WASM
});

Browser Compatibility

Browser	WebGPU	WASM	Notes
Chrome 113+	✅	✅	Best performance with WebGPU
Edge 113+	✅	✅	Same as Chrome
Firefox	❌	✅	WASM only
Safari 18+	✅	✅	WebGPU available
iOS Safari	✅	✅	WebGPU available (iOS 26+)

Performance Tips

Performance

Use quantized models - Smaller and faster with minimal quality loss
Preload models - Load during app init for instant inference
Use WebGPU when available - 3-5x faster than WASM
Batch operations - Process multiple inputs together

Next Steps

Embeddings

Generate text embeddings for semantic search.

Reranking

Improve RAG accuracy with reranking.

Core Package

Learn about the core LocalMode functions.

Overview

@localmode/transformers

Features

Installation

Quick Start

✅ Live Features

Embeddings

Reranking

Recommended Models

🚧 Coming Soon

Classification & NLP

Translation & Text Processing

Audio

Vision

Model Options

Model Utilities

WebGPU Detection

Browser Compatibility

Performance Tips

Next Steps

Embeddings

Reranking

Core Package

On this page