LocalMode
Run ML models entirely in your browser. Embeddings, vector search, LLM chat, vision, audio, agents, and structured output - all offline, all private.
No servers. No API keys. Your data never leaves your device.
Built for the Modern Web
AI in the Browser
Run embeddings, LLMs, classification, vision, audio, and agents directly in the browser with WebGPU and WASM.
Privacy-First
Zero telemetry. No data leaves your device. Built-in encryption, PII redaction, and differential privacy.
Zero-Dependency Core
Core package has no external dependencies. Built entirely on native Web APIs.
Offline-Ready
Models cached in IndexedDB. Works without internet after initial download. Automatic fallbacks.
Interoperable
Vercel AI SDK patterns. LangChain.js adapters. Import vectors from Pinecone and ChromaDB.
Device-Aware
Adaptive batching, model recommendations, and WebGPU acceleration based on device capabilities.
13 Packages
Modular architecture - use only what you need. Zero-dependency core provides everything; providers add ML framework integrations.
Core & React
AI Providers
@localmode/transformers
HuggingFace Transformers.js - 25 model factories for embeddings, vision, audio, OCR, and LLM inference.
@localmode/webllm
WebLLM via WebGPU - 30 curated models including DeepSeek-R1, Qwen3, Llama 3.2, Phi 3.5 Vision.
@localmode/wllama
GGUF models via llama.cpp WASM - curated catalog + 135K+ HuggingFace models, universal browser support.
@localmode/chrome-ai
Chrome Built-in AI - zero-download inference via Gemini Nano with automatic fallback.
Ecosystem
@localmode/ai-sdk
Vercel AI SDK provider for local models.
@localmode/langchain
LangChain.js adapters — drop-in local embeddings, chat, vector store, and reranker.
@localmode/devtools
In-app DevTools widget for model cache, VectorDB stats, and inference queue observability.
@localmode/pdfjs
PDF text extraction with PDF.js for document processing pipelines.
Capabilities
From embeddings and vector search to agents, vision, audio, and security - everything runs locally in the browser.
Embeddings & Vector Search
- •Text and streaming embeddings
- •HNSW index with WebGPU
- •SQ8 + PQ compression (4–32x)
- •Hybrid BM25 + semantic search
- •Multimodal search via CLIP
LLM Generation
- •Streaming text generation
- •Typed JSON output with Zod
- •Semantic response caching
- •Language model middleware
- •3 providers: WebGPU, WASM, ONNX
Agents & Pipelines
- •ReAct loop with tool registry
- •VectorDB-backed memory
- •Multi-step pipelines
- •Priority inference queue
- •10 built-in step types
Vision & OCR
- •Image classification & captioning
- •Object detection & segmentation
- •Optical character recognition
- •Document & table QA
- •Image-to-image & depth
Audio
- •Speech-to-text transcription
- •Text-to-speech synthesis
- •Audio classification
- •Offline voice notes
- •Meeting summarization
Security & Privacy
- •AES-GCM encryption
- •Named-entity PII redaction
- •Differential privacy noise
- •Embedding drift detection
- •Zero telemetry or tracking
RAG & Chunking
- •Recursive & semantic chunkers
- •End-to-end ingestion pipeline
- •Reranking for better retrieval
- •Import from Pinecone & Chroma
- •Export to CSV and JSONL
Evaluation & Tooling
- •Classification & retrieval metrics
- •Threshold calibration
- •Device-aware model registry
- •Adaptive batch sizing
- •In-app DevTools widget
Simple, Powerful API
Function-first design with TypeScript. All operations return structured results.
Embeddings & Vector Search
$ pnpm install @localmode/core @localmode/transformers
import { createVectorDB, embed, embedMany, chunk } from '@localmode/core'; import { transformers } from '@localmode/transformers'; // Create embedding model const model = transformers.embedding('Xenova/bge-small-en-v1.5'); // Create vector database with typed metadata const db = await createVectorDB<{ text: string }>({ name: 'docs', dimensions: 384, }); // Chunk and embed documents const chunks = chunk(documentText, { size: 512, overlap: 50 }); const { embeddings } = await embedMany({ model, values: chunks.map((c) => c.text), }); // Store vectors await db.addMany( chunks.map((c, i) => ({ id: `chunk-${i}`, vector: embeddings[i], metadata: { text: c.text }, })) ); // Search const { embedding: query } = await embed({ model, value: 'What is AI?' }); const results = await db.search(query, { k: 5 });
LLM Chat & Structured Output
$ pnpm install @localmode/core @localmode/webllm
import { streamText, generateObject, jsonSchema } from '@localmode/core'; import { webllm } from '@localmode/webllm'; import { z } from 'zod'; // Stream text from a local LLM const model = webllm.languageModel('Llama-3.2-1B-Instruct-q4f16_1-MLC'); const result = await streamText({ model, prompt: 'Explain quantum computing simply', maxTokens: 500, }); for await (const chunk of result.stream) { process.stdout.write(chunk.text); } // Structured output with Zod schema const { object } = await generateObject({ model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'), schema: jsonSchema( z.object({ name: z.string(), age: z.number(), interests: z.array(z.string()), }) ), prompt: 'Generate a profile for a software engineer named Alex', });
3 LLM Providers, 1 Interface
All providers implement the same LanguageModel interface - swap with a single line change.
| WebLLM | Wllama | Transformers.js | |
|---|---|---|---|
| Runtime | WebGPU | WASM (llama.cpp) | ONNX Runtime |
| Models | 30 curated (MLC) | 135K+ GGUF from HF | 14 curated ONNX (TJS v4) |
| Speed | Fastest (GPU) | Good (CPU) | Good (CPU/GPU) |
| Browser Support | Chrome/Edge 113+ | All modern browsers | All modern browsers |
| Best For | Maximum performance | Universal compatibility | Multi-task (embed + LLM) |
32 Demo Applications
See every feature in action at localmode.ai. All apps run 100% in the browser.
Chat, Agents & Audio
6 appsText & NLP
9 appsVision & Images
9 appsBlog
Guides, tutorials, and deep dives on local-first AI, browser ML, RAG patterns, privacy-preserving inference, and more.
Read the BlogReady to Build?
Start building local-first AI applications with comprehensive documentation, 32 example apps, and guides for every feature.