Local-First AI for the Web

LocalMode

Run ML models entirely in your browser. Embeddings, vector search, LLM chat, vision, audio, agents, and structured output - all offline, all private.
No servers. No API keys. Your data never leaves your device.

Read the Docs Try 36 Live Blocks View on GitHub

Built for the Modern Web

AI in the Browser

Run embeddings, LLMs, classification, vision, audio, and agents directly in the browser with WebGPU and WASM.

Privacy-First

Zero telemetry. No data leaves your device. Built-in encryption, PII redaction, and differential privacy.

Zero-Dependency Core

Core package has no external dependencies. Built entirely on native Web APIs.

Offline-Ready

Models cached in IndexedDB. Works without internet after initial download. Automatic fallbacks.

Interoperable

Vercel AI SDK patterns. LangChain.js adapters. Import vectors from Pinecone and ChromaDB.

Device-Aware

Adaptive batching, model recommendations, and WebGPU acceleration based on device capabilities.

15 Packages

Modular architecture - use only what you need. Zero-dependency core provides everything; providers add ML framework integrations.

Core & React

@localmode/core

VectorDB (HNSW + WebGPU), pipelines, inference queue, model cache, agent framework, evaluation SDK, all interfaces.

Learn more

@localmode/react

56 React hooks, 10 pipeline step factories, batch/list processing, and browser helpers.

Learn more

AI Providers

@localmode/transformers

HuggingFace Transformers.js v4 - 26 model factories for embeddings, vision, audio, OCR, LLM inference.

Learn more

@localmode/webllm

WebLLM via WebGPU - 32 curated models including DeepSeek-R1, Qwen3, Llama 3.2, Phi 3.5 Vision.

Learn more

@localmode/litert

Google LiteRT-LM provider - 3 verified models (Gemma 4 E2B/E4B, Qwen3 0.6B); WebGPU + CPU WASM fallback. Text-only.

Learn more

@localmode/mediapipe

Google MediaPipe Tasks - 13 verified models for landmarks, gestures, face detection, classification, segmentation, language detection, and streaming trackers.

Learn more

@localmode/wllama

GGUF models via llama.cpp WASM - curated catalog + 160K+ HuggingFace models, universal browser support.

Learn more

@localmode/chrome-ai

Chrome Built-in AI - zero-download inference via Gemini Nano with automatic fallback.

Learn more

Ecosystem

@localmode/ai-sdk

Vercel AI SDK provider for local models.

Learn more

@localmode/langchain

LangChain.js adapters — drop-in local embeddings, chat, vector store, and reranker.

Learn more

@localmode/devtools

In-app DevTools widget for model cache, VectorDB stats, and inference queue observability.

Learn more

@localmode/pdfjs

PDF text extraction with PDF.js for document processing pipelines.

Learn more

Storage Adapters

@localmode/dexie

Dexie.js storage adapter with schema versioning and transactions.

Learn more

@localmode/idb

Minimal IndexedDB storage adapter using the idb library.

Learn more

@localmode/localforage

Cross-browser storage adapter with automatic fallback.

Learn more

Capabilities

From embeddings and vector search to agents, vision, audio, and security - everything runs locally in the browser.

Embeddings & Vector Search

•Text and streaming embeddings
•HNSW index with WebGPU
•SQ8 + PQ compression (4–32x)
•Hybrid BM25 + semantic search
•Multimodal search via CLIP

LLM Generation

•Streaming text generation
•Typed JSON output with Zod
•Semantic response caching
•Language model middleware
•5 providers: WebGPU, WASM, ONNX, LiteRT, Chrome AI

Agents & Pipelines

•ReAct loop with tool registry
•VectorDB-backed memory
•Multi-step pipelines
•Priority inference queue
•10 built-in step types

Vision & OCR

•Image classification & captioning
•Object detection & segmentation
•Optical character recognition
•Hand, pose & face landmarks
•Gesture recognition (8 classes)

Audio

•Speech-to-text transcription
•Live transcription with VAD
•Streaming TTS (29 English voices)
•Audio classification
•Kokoro TTS with speed control

Security & Privacy

•AES-GCM encryption
•Named-entity PII redaction
•Differential privacy noise
•Append-only hash-chained audit log
•Zero telemetry or tracking

RAG & Chunking

•Recursive & semantic chunkers
•End-to-end ingestion pipeline
•Reranking for better retrieval
•Import from Pinecone & Chroma
•Export to CSV and JSONL

Evaluation & Tooling

•Classification & retrieval metrics
•Threshold calibration
•Device-aware model registry
•Adaptive batch sizing
•In-app DevTools widget

Simple, Powerful API

Function-first design with TypeScript. All operations return structured results.

Embeddings & Vector Search

Terminal

$ pnpm install @localmode/core @localmode/transformers

embeddings.ts

import { createVectorDB, embed, embedMany, chunk } from '@localmode/core';
import { transformers } from '@localmode/transformers';

// Create embedding model
const model = transformers.embedding('Xenova/bge-small-en-v1.5');

// Create vector database with typed metadata
const db = await createVectorDB<{ text: string }>({
  name: 'docs',
  dimensions: 384,
});

// Chunk and embed documents
const chunks = chunk(documentText, { size: 512, overlap: 50 });
const { embeddings } = await embedMany({
  model,
  values: chunks.map((c) => c.text),
});

// Store vectors
await db.addMany(
  chunks.map((c, i) => ({
    id: `chunk-${i}`,
    vector: embeddings[i],
    metadata: { text: c.text },
  }))
);

// Search
const { embedding: query } = await embed({ model, value: 'What is AI?' });
const results = await db.search(query, { k: 5 });

LLM Chat & Structured Output

Terminal

$ pnpm install @localmode/core @localmode/webllm

chat.ts

import { streamText, generateObject, jsonSchema } from '@localmode/core';
import { webllm } from '@localmode/webllm';
// import { wllama } from '@localmode/wllama'; // alternative provider
import { z } from 'zod';

// Stream text from a local LLM
const model = webllm.languageModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');
// const model = wllama.languageModel('Qwen2.5-0.5B-Instruct-Q4_K_M');

const result = await streamText({
  model,
  prompt: 'Explain quantum computing simply',
  maxTokens: 500,
});

for await (const chunk of result.stream) {
  process.stdout.write(chunk.text);
}

// Structured output with Zod schema
const { object } = await generateObject({
  model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
  schema: jsonSchema(
    z.object({
      name: z.string(),
      age: z.number(),
      interests: z.array(z.string()),
    })
  ),
  prompt: 'Generate a profile for a software engineer named Alex',
});

5 LLM Providers, 1 Interface

All providers implement the same LanguageModel interface - swap with a single line change.

	WebLLM	Wllama	Transformers.js	LiteRT	Chrome AI
Runtime	WebGPU	WASM (llama.cpp)	ONNX Runtime	WebGPU / CPU WASM	Built-in (Gemini Nano)
Models	32 curated (MLC)	160K+ GGUF from HF	16 curated ONNX (TJS v4)	3 verified (.litertlm)	1 built-in (Gemini Nano)
Speed	Fastest (GPU)	Good (CPU)	Good (CPU/GPU)	Fast (GPU) / Good (CPU)	Fast (Chrome-managed)
Browser Support	Chrome/Edge 113+	All modern browsers	All modern browsers	Chrome/Edge (WebGPU)	Chrome 148+ desktop
Best For	Maximum performance	Universal compatibility	Multi-task (embed + LLM)	Google on-device models	Shipping no model files

100+ UI Components

Beyond the packages, LocalMode ships 107 copy-owned React components across 10 families - composable, local-first AI UI primitives you install with the shadcn CLI and own outright.
Browse them all at LocalMode.ai.

36 Interactive Blocks

The components composed into full, installable experiences - each running a real model entirely in the browser. See them live at LocalMode.ai/blocks.

Chat, Agents & Audio

9 blocks

Chat Research Agent Data Extractor Voice Notes Live Transcription Meeting Assistant Voice Explorer Audiobook Reader Audio Classifier

Text, Writing & NLP

9 blocks

Write Translate Summarize Complete Language Detector Sentiment Analyzer Text Classifier Model Evaluator Threshold Calibrator

Vision, Photo & Images

9 blocks

Object Detector Live Tracker Smart Gallery Image Search Duplicate Finder Photo Categorizer Background Remover Image Enhancer Image Captioner

Knowledge, Device & Privacy

9 blocks

Semantic Search Document QA RAG Chat Vector Data Manager Device Report Model Advisor GGUF Explorer PII Redactor Encrypted Vault

Explore all 36 blocks

Blog

Guides, tutorials, and deep dives on local-first AI, browser ML, RAG patterns, privacy-preserving inference, and more.

Featured

Ready to Build?

Start building local-first AI applications with comprehensive documentation, 100+ UI components, 36 interactive blocks, and guides for every feature.

Get Started Read the Documentation

LocalMode

Built for the Modern Web

AI in the Browser

Privacy-First

Zero-Dependency Core

Offline-Ready

Interoperable

Device-Aware

15 Packages

Core & React

@localmode/core

@localmode/react

AI Providers

@localmode/transformers

@localmode/webllm

@localmode/litert

@localmode/mediapipe

@localmode/wllama

@localmode/chrome-ai

Ecosystem

@localmode/ai-sdk

@localmode/langchain

@localmode/devtools

@localmode/pdfjs

Storage Adapters

@localmode/dexie

@localmode/idb

@localmode/localforage

Capabilities

Embeddings & Vector Search

LLM Generation

Agents & Pipelines

Vision & OCR

Audio

Security & Privacy

RAG & Chunking

Evaluation & Tooling

Simple, Powerful API

Embeddings & Vector Search

LLM Chat & Structured Output

5 LLM Providers, 1 Interface

100+ UI Components

Conversation

Local-First

Audio

Results & Insights

Input Controls

Media & Vision

Data & Documents

Artifacts & Canvas

Security & Privacy

36 Interactive Blocks

Chat, Agents & Audio

Text, Writing & NLP

Vision, Photo & Images

Knowledge, Device & Privacy

Blog

The 36 AI Blocks in Our Open-Source Gallery - All Running in Your Browser Right Now

The Hybrid AI Architecture: Local for 95%, Cloud for the Rest

Architecture as Policy: Why Most AI Criticism Is Really About Where the Compute Happens

Near Cloud-Quality AI at $0 Cost

Browser LLM Providers, One API

Private RAG Chat With No Backend

Ready to Build?