LocalMode
Transformers

Overview

HuggingFace Transformers.js provider for browser-based ML inference.

@localmode/transformers

HuggingFace Transformers.js provider for LocalMode. Run ML models locally in the browser with WebGPU/WASM acceleration.

Features

  • 🚀 Browser-Native — Run ML models directly in the browser
  • 🔒 Privacy-First — All processing happens locally
  • 📦 Model Caching — Models cached in IndexedDB for instant subsequent loads
  • Optimized — Uses quantized models for smaller size and faster inference

Installation

bash pnpm install @localmode/transformers @localmode/core
bash npm install @localmode/transformers @localmode/core
bash yarn add @localmode/transformers @localmode/core

Quick Start

import { transformers } from '@localmode/transformers';
import { embed, rerank } from '@localmode/core';

// Text Embeddings
const embeddingModel = transformers.embedding('Xenova/all-MiniLM-L6-v2');
const { embedding } = await embed({ model: embeddingModel, value: 'Hello world' });

// Reranking for RAG
const rerankerModel = transformers.reranker('Xenova/ms-marco-MiniLM-L-6-v2');
const { results } = await rerank({
  model: rerankerModel,
  query: 'What is machine learning?',
  documents: ['ML is a subset of AI...', 'Python is a language...'],
  topK: 5,
});

✅ Live Features

These features are production-ready and fully documented.

MethodInterfaceDescription
transformers.embedding(modelId)EmbeddingModelText embeddings
transformers.reranker(modelId)RerankerModelDocument reranking

🚧 Coming Soon

These features have interfaces defined and implementations available, but are under active development and testing. Full documentation will be added once they are production-ready.

The features listed below are not yet production-ready. APIs may change before stable release.

Classification & NLP

FeatureMethodInterface
Text Classificationtransformers.classifier(modelId)ClassificationModel
Zero-Shot Classificationtransformers.zeroShotClassifier(modelId)ZeroShotClassificationModel
Named Entity Recognitiontransformers.ner(modelId)NERModel

Translation & Text Processing

FeatureMethodInterface
Translationtransformers.translator(modelId)TranslationModel
Summarizationtransformers.summarizer(modelId)SummarizationModel
Fill-Masktransformers.fillMask(modelId)FillMaskModel
Question Answeringtransformers.questionAnswering(modelId)QuestionAnsweringModel

Audio

FeatureMethodInterface
Speech-to-Texttransformers.speechToText(modelId)SpeechToTextModel
Text-to-Speechtransformers.textToSpeech(modelId)TextToSpeechModel

Vision

FeatureMethodInterface
Image Classificationtransformers.imageClassifier(modelId)ImageClassificationModel
Zero-Shot Image Classificationtransformers.zeroShotImageClassifier(modelId)ZeroShotImageClassificationModel
Image Captioningtransformers.captioner(modelId)ImageCaptionModel
Image Segmentationtransformers.segmenter(modelId)SegmentationModel
Object Detectiontransformers.objectDetector(modelId)ObjectDetectionModel
OCRtransformers.ocr(modelId)OCRModel
Document QAtransformers.documentQA(modelId)DocumentQAModel

Model Options

Configure model loading:

const model = transformers.embedding('Xenova/all-MiniLM-L6-v2', {
  quantized: true, // Use quantized model (smaller, faster)
  revision: 'main', // Model revision
  progress: (p) => {
    console.log(`Loading: ${(p.progress * 100).toFixed(1)}%`);
  },
});

Model Utilities

Manage model loading and caching:

import { preloadModel, isModelCached, getModelStorageUsage } from '@localmode/transformers';

// Check if model is cached
const cached = await isModelCached('Xenova/all-MiniLM-L6-v2');

// Preload model with progress
await preloadModel('Xenova/all-MiniLM-L6-v2', {
  onProgress: (p) => console.log(`${p.progress}% loaded`),
});

// Check storage usage
const usage = await getModelStorageUsage();

WebGPU Detection

Detect WebGPU availability for optimal device selection:

import { isWebGPUAvailable, getOptimalDevice } from '@localmode/transformers';

// Check if WebGPU is available
const webgpuAvailable = await isWebGPUAvailable();

if (webgpuAvailable) {
  console.log('WebGPU available, using GPU acceleration');
} else {
  console.log('Falling back to WASM');
}

// Get optimal device automatically
const device = await getOptimalDevice(); // 'webgpu' or 'wasm'

const model = transformers.embedding('Xenova/all-MiniLM-L6-v2', {
  device, // Uses WebGPU if available, otherwise WASM
});

Browser Compatibility

BrowserWebGPUWASMNotes
Chrome 113+Best performance with WebGPU
Edge 113+Same as Chrome
FirefoxWASM only
Safari 18+WebGPU available
iOS SafariWebGPU available (iOS 26+)

Performance Tips

Performance

  1. Use quantized models - Smaller and faster with minimal quality loss
  2. Preload models - Load during app init for instant inference
  3. Use WebGPU when available - 3-5x faster than WASM
  4. Batch operations - Process multiple inputs together

Next Steps

On this page