LocalMode
Run ML models entirely in your browser. Embeddings, vector search, LLM chat, vision, audio, agents, and structured output - all offline, all private.
No servers. No API keys. Your data never leaves your device.
Built for the Modern Web
AI in the Browser
Run embeddings, LLMs, classification, vision, audio, and agents directly in the browser with WebGPU and WASM.
Privacy-First
Zero telemetry. No data leaves your device. Built-in encryption, PII redaction, and differential privacy.
Zero-Dependency Core
Core package has no external dependencies. Built entirely on native Web APIs.
Offline-Ready
Models cached in IndexedDB. Works without internet after initial download. Automatic fallbacks.
Interoperable
Vercel AI SDK patterns. LangChain.js adapters. Import vectors from Pinecone and ChromaDB.
Device-Aware
Adaptive batching, model recommendations, and WebGPU acceleration based on device capabilities.
15 Packages
Modular architecture - use only what you need. Zero-dependency core provides everything; providers add ML framework integrations.
Core & React
AI Providers
@localmode/transformers
HuggingFace Transformers.js v4 - 26 model factories for embeddings, vision, audio, OCR, LLM inference.
@localmode/webllm
WebLLM via WebGPU - 32 curated models including DeepSeek-R1, Qwen3, Llama 3.2, Phi 3.5 Vision.
@localmode/litert
Google LiteRT-LM provider - 3 verified models (Gemma 4 E2B/E4B, Qwen3 0.6B); WebGPU + CPU WASM fallback. Text-only.
@localmode/mediapipe
Google MediaPipe Tasks - 13 verified models for landmarks, gestures, face detection, classification, segmentation, language detection, and streaming trackers.
@localmode/wllama
GGUF models via llama.cpp WASM - curated catalog + 160K+ HuggingFace models, universal browser support.
@localmode/chrome-ai
Chrome Built-in AI - zero-download inference via Gemini Nano with automatic fallback.
Ecosystem
@localmode/ai-sdk
Vercel AI SDK provider for local models.
@localmode/langchain
LangChain.js adapters — drop-in local embeddings, chat, vector store, and reranker.
@localmode/devtools
In-app DevTools widget for model cache, VectorDB stats, and inference queue observability.
@localmode/pdfjs
PDF text extraction with PDF.js for document processing pipelines.
Capabilities
From embeddings and vector search to agents, vision, audio, and security - everything runs locally in the browser.
Embeddings & Vector Search
- •Text and streaming embeddings
- •HNSW index with WebGPU
- •SQ8 + PQ compression (4–32x)
- •Hybrid BM25 + semantic search
- •Multimodal search via CLIP
LLM Generation
- •Streaming text generation
- •Typed JSON output with Zod
- •Semantic response caching
- •Language model middleware
- •4 providers: WebGPU, WASM, ONNX, LiteRT
Agents & Pipelines
- •ReAct loop with tool registry
- •VectorDB-backed memory
- •Multi-step pipelines
- •Priority inference queue
- •10 built-in step types
Vision & OCR
- •Image classification & captioning
- •Object detection & segmentation
- •Optical character recognition
- •Hand, pose & face landmarks
- •Gesture recognition (8 classes)
Audio
- •Speech-to-text transcription
- •Live transcription with VAD
- •Streaming TTS (29 English voices)
- •Audio classification
- •Kokoro TTS with speed control
Security & Privacy
- •AES-GCM encryption
- •Named-entity PII redaction
- •Differential privacy noise
- •Append-only hash-chained audit log
- •Zero telemetry or tracking
RAG & Chunking
- •Recursive & semantic chunkers
- •End-to-end ingestion pipeline
- •Reranking for better retrieval
- •Import from Pinecone & Chroma
- •Export to CSV and JSONL
Evaluation & Tooling
- •Classification & retrieval metrics
- •Threshold calibration
- •Device-aware model registry
- •Adaptive batch sizing
- •In-app DevTools widget
Simple, Powerful API
Function-first design with TypeScript. All operations return structured results.
Embeddings & Vector Search
$ pnpm install @localmode/core @localmode/transformers
import { createVectorDB, embed, embedMany, chunk } from '@localmode/core'; import { transformers } from '@localmode/transformers'; // Create embedding model const model = transformers.embedding('Xenova/bge-small-en-v1.5'); // Create vector database with typed metadata const db = await createVectorDB<{ text: string }>({ name: 'docs', dimensions: 384, }); // Chunk and embed documents const chunks = chunk(documentText, { size: 512, overlap: 50 }); const { embeddings } = await embedMany({ model, values: chunks.map((c) => c.text), }); // Store vectors await db.addMany( chunks.map((c, i) => ({ id: `chunk-${i}`, vector: embeddings[i], metadata: { text: c.text }, })) ); // Search const { embedding: query } = await embed({ model, value: 'What is AI?' }); const results = await db.search(query, { k: 5 });
LLM Chat & Structured Output
$ pnpm install @localmode/core @localmode/webllm
import { streamText, generateObject, jsonSchema } from '@localmode/core'; import { webllm } from '@localmode/webllm'; import { z } from 'zod'; // Stream text from a local LLM const model = webllm.languageModel('Llama-3.2-1B-Instruct-q4f16_1-MLC'); const result = await streamText({ model, prompt: 'Explain quantum computing simply', maxTokens: 500, }); for await (const chunk of result.stream) { process.stdout.write(chunk.text); } // Structured output with Zod schema const { object } = await generateObject({ model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'), schema: jsonSchema( z.object({ name: z.string(), age: z.number(), interests: z.array(z.string()), }) ), prompt: 'Generate a profile for a software engineer named Alex', });
4 LLM Providers, 1 Interface
All providers implement the same LanguageModel interface - swap with a single line change.
| WebLLM | Wllama | Transformers.js | LiteRT | |
|---|---|---|---|---|
| Runtime | WebGPU | WASM (llama.cpp) | ONNX Runtime | WebGPU / CPU WASM |
| Models | 32 curated (MLC) | 160K+ GGUF from HF | 14 curated ONNX (TJS v4) | 3 verified (.litertlm) |
| Speed | Fastest (GPU) | Good (CPU) | Good (CPU/GPU) | Fast (GPU) / Good (CPU) |
| Browser Support | Chrome/Edge 113+ | All modern browsers | All modern browsers | Chrome/Edge (WebGPU) |
| Best For | Maximum performance | Universal compatibility | Multi-task (embed + LLM) | Google on-device models |
34 Demo Applications
See every feature in action at localmode.ai. All apps run 100% in the browser.
Chat, Agents & Audio
7 appsText & NLP
9 appsVision & Images
10 appsBlog
Guides, tutorials, and deep dives on local-first AI, browser ML, RAG patterns, privacy-preserving inference, and more.
Featured
The 34 AI Features in Our Open-Source Showcase
Every feature running in your browser, from embeddings and vector search to LLM chat and real-time hand tracking.
Architecture
The Hybrid AI Architecture: Local for 95%, Cloud for the Rest
Route embeddings, classification, and summarization locally at $0 cost while reserving cloud APIs for frontier reasoning.
Analysis
Architecture as Policy: Why Most AI Criticism Is Really About Where the Compute Happens
15 of 20 common AI criticisms target the deployment model, not the technology. Move inference to the browser and they disappear.
Benchmark
Near Cloud-Quality AI at $0 Cost
18 local browser model categories benchmarked against OpenAI, Google, AWS, and Cohere.
Architecture
Browser LLM Providers, One API
WebLLM, Transformers.js v4, wllama, and LiteRT-LM behind a single LanguageModel interface.
Tutorial
Private RAG Chat With No Backend
Build a fully private RAG chatbot that runs entirely in the browser.
Ready to Build?
Start building local-first AI applications with comprehensive documentation, 34 example apps, and guides for every feature.