Introduction
Local-first, privacy-first, offline-first AI for the browser. 13 packages. 32 demo apps. Zero cloud dependencies.
LocalMode
LocalMode is a modular, local-first AI engine for the browser. Run embeddings, vector search, RAG pipelines, LLM chat, agents, structured output, text classification, NER, translation, summarization, speech-to-text, text-to-speech, image captioning, object detection, image segmentation, OCR, document Q&A, multimodal search, and evaluation — all directly in the browser with zero server dependencies.
13 packages. 32 demo apps. Zero cloud dependencies.
Privacy by Default
All processing happens locally. No data ever leaves the user's device. Zero telemetry. Zero tracking. Built-in encryption, PII redaction, and differential privacy.
Why LocalMode?
- Privacy — Data never leaves the device. No telemetry, no tracking, no network requests from core.
- Offline — Works without internet after model download. Automatic fallbacks for every capability.
- Fast — No network latency. WebGPU acceleration where available. Instant inference.
- Free — No API costs, no rate limits, unlimited usage.
- Universal — Works in Chrome, Edge, Firefox, and Safari. Adapts to device capabilities.
- Interoperable — Vercel AI SDK patterns. LangChain.js adapters. Import from Pinecone/ChromaDB.
Packages
Core & React
@localmode/core
Zero-dependency core — VectorDB (HNSW, typed metadata, WebGPU search, quantization), pipelines, inference queue, model cache, agent framework, evaluation SDK, import/export, middleware, security.
@localmode/react
46 React hooks for all core functions — useChat, useEmbed, useClassify, useTranscribe, and more with built-in loading, error handling, and cancellation.
AI Providers
@localmode/transformers
HuggingFace Transformers.js provider — 25 model factories for embeddings, classification, vision, audio, OCR, multimodal (CLIP), and LLM inference via ONNX.
@localmode/webllm
WebLLM provider — 30 curated WebGPU models including DeepSeek-R1, Qwen3, Llama 3.2, Phi 3.5 Vision. Fastest LLM inference.
@localmode/wllama
GGUF model provider via llama.cpp WASM — 135K+ HuggingFace models, GGUF metadata inspection, universal browser support.
@localmode/chrome-ai
Chrome Built-in AI provider — zero-download inference via Gemini Nano with automatic fallback.
Ecosystem
@localmode/ai-sdk
Vercel AI SDK provider — use generateText, streamText, and embed from the ai package with local models.
@localmode/langchain
LangChain.js adapters — drop-in local embeddings, chat, vector store, and reranker for existing LangChain apps.
@localmode/devtools
In-app DevTools widget for model cache, VectorDB stats, inference queue, and pipeline observability.
Storage & Utilities
@localmode/pdfjs
PDF text extraction for document processing pipelines.
@localmode/dexie
Enhanced IndexedDB storage with Dexie.js — schema versioning and transactions.
@localmode/idb
Minimal IndexedDB storage adapter (~3KB) for lightweight apps.
@localmode/localforage
Cross-browser storage with automatic IndexedDB/WebSQL/localStorage fallback.
Quick Start
Install packages
bash pnpm install @localmode/core @localmode/transformers bash npm install @localmode/core @localmode/transformers bash yarn add @localmode/core @localmode/transformers bash bun add @localmode/core @localmode/transformers Semantic search with embeddings
import { embed, embedMany, createVectorDB, chunk } from '@localmode/core';
import { transformers } from '@localmode/transformers';
// Create embedding model
const model = transformers.embedding('Xenova/bge-small-en-v1.5');
// Create vector database with typed metadata
const db = await createVectorDB<{ text: string }>({
name: 'docs',
dimensions: 384,
});
// Chunk and embed documents
const chunks = chunk(documentText, { size: 512, overlap: 50 });
const { embeddings } = await embedMany({
model,
values: chunks.map((c) => c.text),
});
// Store vectors
await db.addMany(
chunks.map((c, i) => ({
id: `chunk-${i}`,
vector: embeddings[i],
metadata: { text: c.text },
}))
);
// Search
const { embedding: query } = await embed({ model, value: 'What is AI?' });
const results = await db.search(query, { k: 5 });LLM chat with streaming
Three providers implement the same LanguageModel interface — choose based on your needs:
import { streamText } from '@localmode/core';
import { webllm } from '@localmode/webllm';
// Pick any provider — all share the same LanguageModel interface
const model = webllm.languageModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');
// const model = wllama.languageModel('Llama-3.2-1B-Instruct-Q4_K_M');
// const model = transformers.languageModel('onnx-community/Qwen3-0.6B-ONNX');
const result = await streamText({
model,
prompt: 'Explain quantum computing simply',
maxTokens: 500,
});
for await (const chunk of result.stream) {
process.stdout.write(chunk.text);
}Structured output
import { generateObject, jsonSchema } from '@localmode/core';
import { webllm } from '@localmode/webllm';
import { z } from 'zod';
const { object } = await generateObject({
model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
schema: jsonSchema(
z.object({
name: z.string(),
age: z.number(),
interests: z.array(z.string()),
})
),
prompt: 'Generate a profile for a software engineer named Alex',
});React hooks
import { useChat, useEmbed, useClassify } from '@localmode/react';
import { webllm } from '@localmode/webllm';
function ChatApp() {
const { messages, sendMessage, isStreaming } = useChat({
model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
});
return <ChatUI messages={messages} onSend={sendMessage} loading={isStreaming} />;
}Features
Core AI Functions
| Feature | Functions | Description |
|---|---|---|
| Embeddings | embed(), embedMany(), streamEmbedMany() | Text embeddings with streaming and batching |
| Multimodal Embeddings | embedImage(), embedManyImages() | CLIP-based text-image cross-modal search |
| Streaming LLM | streamText(), generateText() | Streaming and complete text generation |
| Structured Output | generateObject(), streamObject() | Typed JSON generation with Zod schema validation |
| Classification | classify(), classifyZeroShot(), classifyMany() | Sentiment, intent, topic classification |
| NER | extractEntities() | Named entity recognition |
| Reranking | rerank() | Document reranking for improved RAG |
| Translation | translate() | Multi-language translation (20+ languages) |
| Summarization | summarize() | Text summarization |
| Question Answering | answerQuestion() | Extractive QA with confidence scores |
| Fill-Mask | fillMask() | Masked token prediction (BERT-style) |
| OCR | extractText() | Optical character recognition |
| Document QA | askDocument(), askTable() | Visual document and table understanding |
| Audio | transcribe(), synthesizeSpeech(), classifyAudio() | Speech-to-text, TTS, audio classification |
| Vision | classifyImage(), captionImage(), detectObjects(), segmentImage() | Image processing and analysis |
Vector Database & RAG
| Feature | Functions | Description |
|---|---|---|
| Vector Database | createVectorDB() | HNSW index, IndexedDB persistence, cross-tab sync, typed metadata |
| Semantic Search | semanticSearch(), streamSemanticSearch() | Query-time embed + search in one call |
| Quantization | createVectorDB({ quantization }) | SQ8 (4x) and Product Quantization (8-32x compression) |
| WebGPU Search | createGPUDistanceComputer() | WGSL compute shaders for batch distance computation |
| Hybrid Search | createHybridSearch(), reciprocalRankFusion() | BM25 keyword + vector semantic search fusion |
| Chunking | chunk(), semanticChunk(), codeChunk(), markdownChunk() | Recursive, semantic, code-aware, and markdown chunking |
| Pipelines | createPipeline() | Composable multi-step workflows with 10 built-in step types |
| Inference Queue | createInferenceQueue() | Priority-based task scheduling with concurrency control |
| Semantic Cache | createSemanticCache() | Cache LLM responses using embedding similarity |
| Import/Export | importFrom(), exportToCSV(), exportToJSONL() | Migrate vectors from Pinecone, ChromaDB, CSV, JSONL |
Agents & Evaluation
| Feature | Functions | Description |
|---|---|---|
| Agent Framework | createAgent(), runAgent() | ReAct loop with tool registry and VectorDB-backed memory |
| Evaluation SDK | evaluateModel(), accuracy(), bleuScore(), ndcg() | Classification, generation, and retrieval metrics |
| Threshold Calibration | calibrateThreshold(), getDefaultThreshold() | Empirical similarity thresholds from corpus data |
| Model Registry | recommendModels(), registerModel() | Curated model catalog with device-aware recommendations |
| Adaptive Batching | computeOptimalBatchSize() | Device-aware batch sizing for optimal throughput |
Security & Privacy
| Feature | Functions | Description |
|---|---|---|
| Encryption | encrypt(), decrypt(), deriveKey() | Web Crypto API encryption, PBKDF2 key derivation |
| PII Redaction | redactPII(), piiRedactionMiddleware() | Named entity based PII detection and redaction |
| Differential Privacy | dpEmbeddingMiddleware(), createPrivacyBudget() | DP noise injection for embeddings and classification |
| Drift Detection | checkModelCompatibility(), reindexCollection() | Detect model changes, auto-reindex collections |
LLM Provider Comparison
| WebLLM | Wllama | Transformers.js | |
|---|---|---|---|
| Runtime | WebGPU | WASM (llama.cpp) | ONNX Runtime |
| Models | 30 curated (MLC) | 135K+ GGUF from HuggingFace | 14 ONNX (TJS v4) |
| Speed | Fastest (GPU) | Good (CPU) | Good (CPU/GPU) |
| Vision | Phi 3.5 Vision | — | Qwen3.5 Vision |
| Browser Support | Chrome/Edge 113+ | All modern browsers | All modern browsers |
| Best For | Maximum performance | Universal compatibility, model variety | Multi-task (embeddings + LLM in one package) |
Architecture
LocalMode follows a "zero-dependency core, thin provider wrappers" architecture:
┌───────────────────────────────────────────────────────────────────────┐
│ Your Application │
├───────────────────────────────────────────────────────────────────────┤
│ @localmode/react (46 hooks) │
├───────────────────────┬───────────────────────┬───────────────────────┤
│ @localmode/ai-sdk │ @localmode/langchain │ @localmode/devtools │
├───────────────────────┴───────────────────────┴───────────────────────┤
│ @localmode/core │
│ │
│ VectorDB Embeddings Generation Agents & Pipelines │
│ (HNSW + + Multimodal + Structured + Evaluation │
│ WebGPU) Output + Metrics │
│ │
│ Security Middleware Import/Export Model Cache │
│ (DP, PII, System + Registry │
│ Crypto) │
├───────────────────────────────────────────────────────────────────────┤
│ Provider Packages │
│ │
│ @localmode/transformers HF Transformers.js 25 factories │
│ @localmode/webllm WebGPU 30 models │
│ @localmode/wllama llama.cpp WASM 135K+ models │
│ @localmode/chrome-ai Gemini Nano zero-download │
├───────────────────────────────────────────────────────────────────────┤
│ Browser APIs │
│ │
│ WebGPU · IndexedDB · Web Workers · Web Crypto │
└───────────────────────────────────────────────────────────────────────┘Demo Applications
See LocalMode in action at localmode.ai — 32 apps showcasing every feature.
| Category | Apps |
|---|---|
| Chat & Agents | LLM Chat, Research Agent, GGUF Explorer |
| Audio | Voice Notes, Meeting Assistant, Audiobook Creator |
| Text & NLP | Smart Writer, Data Extractor, Sentiment Analyzer, Email Classifier, Translator, Text Summarizer, Q&A Bot, Smart Autocomplete, Invoice Q&A |
| Vision | Background Remover, Smart Gallery, Product Search, Cross-Modal Search, Image Captioner, OCR Scanner, Object Detector, Duplicate Finder, Photo Enhancer |
| RAG & Search | PDF Search, Semantic Search, LangChain RAG, Data Migrator |
| Privacy | Document Redactor, Encrypted Vault |
| Developer Tools | Model Advisor, Model Evaluator |
Browser Compatibility
| Browser | WebGPU | WASM | IndexedDB | Workers | Chrome AI |
|---|---|---|---|---|---|
| Chrome 138+ | Yes | Yes | Yes | Yes | Yes |
| Edge 138+ | Yes | Yes | Yes | Yes | Yes |
| Firefox 75+ | Nightly | Yes | Yes | Yes | No |
| Safari 18+ | Yes | Yes | Yes | Partial | No |
Platform Notes
- Chrome AI: Zero-download inference via Gemini Nano (fallback to Transformers.js)
- WebGPU: 3-5x faster inference (fallback to WASM)
- Safari/iOS: Private browsing blocks IndexedDB — use
MemoryStoragefallback - Firefox: WebGPU only in Nightly — WASM fallback is automatic
- SharedArrayBuffer: Requires cross-origin isolation for some features
Next Steps
Getting Started
Full installation guide, bundler config, and first steps.
Core Package
VectorDB, embeddings, RAG, agents, evaluation, middleware, and security.
React Hooks
46 hooks for chat, embeddings, classification, vision, audio, and more.
Transformers Provider
25 model factories for text, vision, audio, and multimodal tasks.
WebLLM Provider
30 curated WebGPU models for fastest LLM inference.
Wllama Provider
135K+ GGUF models via llama.cpp WASM — universal browser support.