Local-First AI for the Web

LocalMode

Run ML models entirely in your browser. Embeddings, vector search, LLM chat, vision, audio, agents, and structured output - all offline, all private.
No servers. No API keys. Your data never leaves your device.

Read the Docs Try 32 Demo Apps View on GitHub

Built for the Modern Web

AI in the Browser

Run embeddings, LLMs, classification, vision, audio, and agents directly in the browser with WebGPU and WASM.

Privacy-First

Zero telemetry. No data leaves your device. Built-in encryption, PII redaction, and differential privacy.

Zero-Dependency Core

Core package has no external dependencies. Built entirely on native Web APIs.

Offline-Ready

Models cached in IndexedDB. Works without internet after initial download. Automatic fallbacks.

Interoperable

Vercel AI SDK patterns. LangChain.js adapters. Import vectors from Pinecone and ChromaDB.

Device-Aware

Adaptive batching, model recommendations, and WebGPU acceleration based on device capabilities.

13 Packages

Modular architecture - use only what you need. Zero-dependency core provides everything; providers add ML framework integrations.

Core & React

@localmode/core

VectorDB (HNSW + WebGPU), pipelines, inference queue, model cache, agent framework, evaluation SDK, all interfaces.

Learn more

@localmode/react

46 React hooks, 10 pipeline step factories, batch/list processing, and browser helpers.

Learn more

AI Providers

@localmode/transformers

HuggingFace Transformers.js - 25 model factories for embeddings, vision, audio, OCR, and LLM inference.

Learn more

@localmode/webllm

WebLLM via WebGPU - 30 curated models including DeepSeek-R1, Qwen3, Llama 3.2, Phi 3.5 Vision.

Learn more

@localmode/wllama

GGUF models via llama.cpp WASM - curated catalog + 135K+ HuggingFace models, universal browser support.

Learn more

@localmode/chrome-ai

Chrome Built-in AI - zero-download inference via Gemini Nano with automatic fallback.

Learn more

Ecosystem

@localmode/ai-sdk

Vercel AI SDK provider for local models.

Learn more

@localmode/langchain

LangChain.js adapters — drop-in local embeddings, chat, vector store, and reranker.

Learn more

@localmode/devtools

In-app DevTools widget for model cache, VectorDB stats, and inference queue observability.

Learn more

@localmode/pdfjs

PDF text extraction with PDF.js for document processing pipelines.

Learn more

Storage Adapters

@localmode/dexie

Dexie.js storage adapter with schema versioning and transactions.

Learn more

@localmode/idb

Minimal IndexedDB storage adapter using the idb library.

Learn more

@localmode/localforage

Cross-browser storage adapter with automatic fallback.

Learn more

Capabilities

From embeddings and vector search to agents, vision, audio, and security - everything runs locally in the browser.

Embeddings & Vector Search

•Text and streaming embeddings
•HNSW index with WebGPU
•SQ8 + PQ compression (4–32x)
•Hybrid BM25 + semantic search
•Multimodal search via CLIP

LLM Generation

•Streaming text generation
•Typed JSON output with Zod
•Semantic response caching
•Language model middleware
•3 providers: WebGPU, WASM, ONNX

Agents & Pipelines

•ReAct loop with tool registry
•VectorDB-backed memory
•Multi-step pipelines
•Priority inference queue
•10 built-in step types

Vision & OCR

•Image classification & captioning
•Object detection & segmentation
•Optical character recognition
•Document & table QA
•Image-to-image & depth

Audio

•Speech-to-text transcription
•Text-to-speech synthesis
•Audio classification
•Offline voice notes
•Meeting summarization

Security & Privacy

•AES-GCM encryption
•Named-entity PII redaction
•Differential privacy noise
•Embedding drift detection
•Zero telemetry or tracking

RAG & Chunking

•Recursive & semantic chunkers
•End-to-end ingestion pipeline
•Reranking for better retrieval
•Import from Pinecone & Chroma
•Export to CSV and JSONL

Evaluation & Tooling

•Classification & retrieval metrics
•Threshold calibration
•Device-aware model registry
•Adaptive batch sizing
•In-app DevTools widget

Simple, Powerful API

Function-first design with TypeScript. All operations return structured results.

Embeddings & Vector Search

Terminal

$ pnpm install @localmode/core @localmode/transformers

embeddings.ts

import { createVectorDB, embed, embedMany, chunk } from '@localmode/core';
import { transformers } from '@localmode/transformers';

// Create embedding model
const model = transformers.embedding('Xenova/bge-small-en-v1.5');

// Create vector database with typed metadata
const db = await createVectorDB<{ text: string }>({
  name: 'docs',
  dimensions: 384,
});

// Chunk and embed documents
const chunks = chunk(documentText, { size: 512, overlap: 50 });
const { embeddings } = await embedMany({
  model,
  values: chunks.map((c) => c.text),
});

// Store vectors
await db.addMany(
  chunks.map((c, i) => ({
    id: `chunk-${i}`,
    vector: embeddings[i],
    metadata: { text: c.text },
  }))
);

// Search
const { embedding: query } = await embed({ model, value: 'What is AI?' });
const results = await db.search(query, { k: 5 });

LLM Chat & Structured Output

Terminal

$ pnpm install @localmode/core @localmode/webllm

chat.ts

import { streamText, generateObject, jsonSchema } from '@localmode/core';
import { webllm } from '@localmode/webllm';
import { z } from 'zod';

// Stream text from a local LLM
const model = webllm.languageModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');

const result = await streamText({
  model,
  prompt: 'Explain quantum computing simply',
  maxTokens: 500,
});

for await (const chunk of result.stream) {
  process.stdout.write(chunk.text);
}

// Structured output with Zod schema
const { object } = await generateObject({
  model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
  schema: jsonSchema(
    z.object({
      name: z.string(),
      age: z.number(),
      interests: z.array(z.string()),
    })
  ),
  prompt: 'Generate a profile for a software engineer named Alex',
});

3 LLM Providers, 1 Interface

All providers implement the same LanguageModel interface - swap with a single line change.

	WebLLM	Wllama	Transformers.js
Runtime	WebGPU	WASM (llama.cpp)	ONNX Runtime
Models	30 curated (MLC)	135K+ GGUF from HF	14 curated ONNX (TJS v4)
Speed	Fastest (GPU)	Good (CPU)	Good (CPU/GPU)
Browser Support	Chrome/Edge 113+	All modern browsers	All modern browsers
Best For	Maximum performance	Universal compatibility	Multi-task (embed + LLM)