Transformers
Overview
HuggingFace Transformers.js provider for browser-based ML inference.
@localmode/transformers
HuggingFace Transformers.js provider for LocalMode. Run ML models locally in the browser with WebGPU/WASM acceleration.
Features
- 🚀 Browser-Native — Run ML models directly in the browser
- 🔒 Privacy-First — All processing happens locally
- 📦 Model Caching — Models cached in IndexedDB for instant subsequent loads
- ⚡ Optimized — Uses quantized models for smaller size and faster inference
Installation
bash pnpm install @localmode/transformers @localmode/core bash npm install @localmode/transformers @localmode/core bash yarn add @localmode/transformers @localmode/core Quick Start
import { transformers } from '@localmode/transformers';
import { embed, rerank } from '@localmode/core';
// Text Embeddings
const embeddingModel = transformers.embedding('Xenova/all-MiniLM-L6-v2');
const { embedding } = await embed({ model: embeddingModel, value: 'Hello world' });
// Reranking for RAG
const rerankerModel = transformers.reranker('Xenova/ms-marco-MiniLM-L-6-v2');
const { results } = await rerank({
model: rerankerModel,
query: 'What is machine learning?',
documents: ['ML is a subset of AI...', 'Python is a language...'],
topK: 5,
});✅ Live Features
These features are production-ready and fully documented.
Embeddings
Generate text embeddings for semantic search and RAG.
Reranking
Improve RAG accuracy with cross-encoder reranking.
| Method | Interface | Description |
|---|---|---|
transformers.embedding(modelId) | EmbeddingModel | Text embeddings |
transformers.reranker(modelId) | RerankerModel | Document reranking |
Recommended Models
🚧 Coming Soon
These features have interfaces defined and implementations available, but are under active development and testing. Full documentation will be added once they are production-ready.
The features listed below are not yet production-ready. APIs may change before stable release.
Classification & NLP
| Feature | Method | Interface |
|---|---|---|
| Text Classification | transformers.classifier(modelId) | ClassificationModel |
| Zero-Shot Classification | transformers.zeroShotClassifier(modelId) | ZeroShotClassificationModel |
| Named Entity Recognition | transformers.ner(modelId) | NERModel |
Translation & Text Processing
| Feature | Method | Interface |
|---|---|---|
| Translation | transformers.translator(modelId) | TranslationModel |
| Summarization | transformers.summarizer(modelId) | SummarizationModel |
| Fill-Mask | transformers.fillMask(modelId) | FillMaskModel |
| Question Answering | transformers.questionAnswering(modelId) | QuestionAnsweringModel |
Audio
| Feature | Method | Interface |
|---|---|---|
| Speech-to-Text | transformers.speechToText(modelId) | SpeechToTextModel |
| Text-to-Speech | transformers.textToSpeech(modelId) | TextToSpeechModel |
Vision
| Feature | Method | Interface |
|---|---|---|
| Image Classification | transformers.imageClassifier(modelId) | ImageClassificationModel |
| Zero-Shot Image Classification | transformers.zeroShotImageClassifier(modelId) | ZeroShotImageClassificationModel |
| Image Captioning | transformers.captioner(modelId) | ImageCaptionModel |
| Image Segmentation | transformers.segmenter(modelId) | SegmentationModel |
| Object Detection | transformers.objectDetector(modelId) | ObjectDetectionModel |
| OCR | transformers.ocr(modelId) | OCRModel |
| Document QA | transformers.documentQA(modelId) | DocumentQAModel |
Model Options
Configure model loading:
const model = transformers.embedding('Xenova/all-MiniLM-L6-v2', {
quantized: true, // Use quantized model (smaller, faster)
revision: 'main', // Model revision
progress: (p) => {
console.log(`Loading: ${(p.progress * 100).toFixed(1)}%`);
},
});Model Utilities
Manage model loading and caching:
import { preloadModel, isModelCached, getModelStorageUsage } from '@localmode/transformers';
// Check if model is cached
const cached = await isModelCached('Xenova/all-MiniLM-L6-v2');
// Preload model with progress
await preloadModel('Xenova/all-MiniLM-L6-v2', {
onProgress: (p) => console.log(`${p.progress}% loaded`),
});
// Check storage usage
const usage = await getModelStorageUsage();WebGPU Detection
Detect WebGPU availability for optimal device selection:
import { isWebGPUAvailable, getOptimalDevice } from '@localmode/transformers';
// Check if WebGPU is available
const webgpuAvailable = await isWebGPUAvailable();
if (webgpuAvailable) {
console.log('WebGPU available, using GPU acceleration');
} else {
console.log('Falling back to WASM');
}
// Get optimal device automatically
const device = await getOptimalDevice(); // 'webgpu' or 'wasm'
const model = transformers.embedding('Xenova/all-MiniLM-L6-v2', {
device, // Uses WebGPU if available, otherwise WASM
});Browser Compatibility
| Browser | WebGPU | WASM | Notes |
|---|---|---|---|
| Chrome 113+ | ✅ | ✅ | Best performance with WebGPU |
| Edge 113+ | ✅ | ✅ | Same as Chrome |
| Firefox | ❌ | ✅ | WASM only |
| Safari 18+ | ✅ | ✅ | WebGPU available |
| iOS Safari | ✅ | ✅ | WebGPU available (iOS 26+) |
Performance Tips
Performance
- Use quantized models - Smaller and faster with minimal quality loss
- Preload models - Load during app init for instant inference
- Use WebGPU when available - 3-5x faster than WASM
- Batch operations - Process multiple inputs together