LocalMode

Introduction

Local-first, privacy-first, offline-first AI for the browser. 13 packages. 32 demo apps. Zero cloud dependencies.

LocalMode

LocalMode is a modular, local-first AI engine for the browser. Run embeddings, vector search, RAG pipelines, LLM chat, agents, structured output, text classification, NER, translation, summarization, speech-to-text, text-to-speech, image captioning, object detection, image segmentation, OCR, document Q&A, multimodal search, and evaluation — all directly in the browser with zero server dependencies.

13 packages. 32 demo apps. Zero cloud dependencies.

Privacy by Default

All processing happens locally. No data ever leaves the user's device. Zero telemetry. Zero tracking. Built-in encryption, PII redaction, and differential privacy.

Why LocalMode?

  • Privacy — Data never leaves the device. No telemetry, no tracking, no network requests from core.
  • Offline — Works without internet after model download. Automatic fallbacks for every capability.
  • Fast — No network latency. WebGPU acceleration where available. Instant inference.
  • Free — No API costs, no rate limits, unlimited usage.
  • Universal — Works in Chrome, Edge, Firefox, and Safari. Adapts to device capabilities.
  • Interoperable — Vercel AI SDK patterns. LangChain.js adapters. Import from Pinecone/ChromaDB.

Packages

Core & React

AI Providers

Ecosystem

Storage & Utilities

Quick Start

Install packages

bash pnpm install @localmode/core @localmode/transformers
bash npm install @localmode/core @localmode/transformers
bash yarn add @localmode/core @localmode/transformers
bash bun add @localmode/core @localmode/transformers

Semantic search with embeddings

import { embed, embedMany, createVectorDB, chunk } from '@localmode/core';
import { transformers } from '@localmode/transformers';

// Create embedding model
const model = transformers.embedding('Xenova/bge-small-en-v1.5');

// Create vector database with typed metadata
const db = await createVectorDB<{ text: string }>({
  name: 'docs',
  dimensions: 384,
});

// Chunk and embed documents
const chunks = chunk(documentText, { size: 512, overlap: 50 });
const { embeddings } = await embedMany({
  model,
  values: chunks.map((c) => c.text),
});

// Store vectors
await db.addMany(
  chunks.map((c, i) => ({
    id: `chunk-${i}`,
    vector: embeddings[i],
    metadata: { text: c.text },
  }))
);

// Search
const { embedding: query } = await embed({ model, value: 'What is AI?' });
const results = await db.search(query, { k: 5 });

LLM chat with streaming

Three providers implement the same LanguageModel interface — choose based on your needs:

import { streamText } from '@localmode/core';
import { webllm } from '@localmode/webllm';

// Pick any provider — all share the same LanguageModel interface
const model = webllm.languageModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');
// const model = wllama.languageModel('Llama-3.2-1B-Instruct-Q4_K_M');
// const model = transformers.languageModel('onnx-community/Qwen3-0.6B-ONNX');

const result = await streamText({
  model,
  prompt: 'Explain quantum computing simply',
  maxTokens: 500,
});

for await (const chunk of result.stream) {
  process.stdout.write(chunk.text);
}

Structured output

import { generateObject, jsonSchema } from '@localmode/core';
import { webllm } from '@localmode/webllm';
import { z } from 'zod';

const { object } = await generateObject({
  model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
  schema: jsonSchema(
    z.object({
      name: z.string(),
      age: z.number(),
      interests: z.array(z.string()),
    })
  ),
  prompt: 'Generate a profile for a software engineer named Alex',
});

React hooks

import { useChat, useEmbed, useClassify } from '@localmode/react';
import { webllm } from '@localmode/webllm';

function ChatApp() {
  const { messages, sendMessage, isStreaming } = useChat({
    model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
  });

  return <ChatUI messages={messages} onSend={sendMessage} loading={isStreaming} />;
}

Features

Core AI Functions

FeatureFunctionsDescription
Embeddingsembed(), embedMany(), streamEmbedMany()Text embeddings with streaming and batching
Multimodal EmbeddingsembedImage(), embedManyImages()CLIP-based text-image cross-modal search
Streaming LLMstreamText(), generateText()Streaming and complete text generation
Structured OutputgenerateObject(), streamObject()Typed JSON generation with Zod schema validation
Classificationclassify(), classifyZeroShot(), classifyMany()Sentiment, intent, topic classification
NERextractEntities()Named entity recognition
Rerankingrerank()Document reranking for improved RAG
Translationtranslate()Multi-language translation (20+ languages)
Summarizationsummarize()Text summarization
Question AnsweringanswerQuestion()Extractive QA with confidence scores
Fill-MaskfillMask()Masked token prediction (BERT-style)
OCRextractText()Optical character recognition
Document QAaskDocument(), askTable()Visual document and table understanding
Audiotranscribe(), synthesizeSpeech(), classifyAudio()Speech-to-text, TTS, audio classification
VisionclassifyImage(), captionImage(), detectObjects(), segmentImage()Image processing and analysis

Vector Database & RAG

FeatureFunctionsDescription
Vector DatabasecreateVectorDB()HNSW index, IndexedDB persistence, cross-tab sync, typed metadata
Semantic SearchsemanticSearch(), streamSemanticSearch()Query-time embed + search in one call
QuantizationcreateVectorDB({ quantization })SQ8 (4x) and Product Quantization (8-32x compression)
WebGPU SearchcreateGPUDistanceComputer()WGSL compute shaders for batch distance computation
Hybrid SearchcreateHybridSearch(), reciprocalRankFusion()BM25 keyword + vector semantic search fusion
Chunkingchunk(), semanticChunk(), codeChunk(), markdownChunk()Recursive, semantic, code-aware, and markdown chunking
PipelinescreatePipeline()Composable multi-step workflows with 10 built-in step types
Inference QueuecreateInferenceQueue()Priority-based task scheduling with concurrency control
Semantic CachecreateSemanticCache()Cache LLM responses using embedding similarity
Import/ExportimportFrom(), exportToCSV(), exportToJSONL()Migrate vectors from Pinecone, ChromaDB, CSV, JSONL

Agents & Evaluation

FeatureFunctionsDescription
Agent FrameworkcreateAgent(), runAgent()ReAct loop with tool registry and VectorDB-backed memory
Evaluation SDKevaluateModel(), accuracy(), bleuScore(), ndcg()Classification, generation, and retrieval metrics
Threshold CalibrationcalibrateThreshold(), getDefaultThreshold()Empirical similarity thresholds from corpus data
Model RegistryrecommendModels(), registerModel()Curated model catalog with device-aware recommendations
Adaptive BatchingcomputeOptimalBatchSize()Device-aware batch sizing for optimal throughput

Security & Privacy

FeatureFunctionsDescription
Encryptionencrypt(), decrypt(), deriveKey()Web Crypto API encryption, PBKDF2 key derivation
PII RedactionredactPII(), piiRedactionMiddleware()Named entity based PII detection and redaction
Differential PrivacydpEmbeddingMiddleware(), createPrivacyBudget()DP noise injection for embeddings and classification
Drift DetectioncheckModelCompatibility(), reindexCollection()Detect model changes, auto-reindex collections

LLM Provider Comparison

WebLLMWllamaTransformers.js
RuntimeWebGPUWASM (llama.cpp)ONNX Runtime
Models30 curated (MLC)135K+ GGUF from HuggingFace14 ONNX (TJS v4)
SpeedFastest (GPU)Good (CPU)Good (CPU/GPU)
VisionPhi 3.5 VisionQwen3.5 Vision
Browser SupportChrome/Edge 113+All modern browsersAll modern browsers
Best ForMaximum performanceUniversal compatibility, model varietyMulti-task (embeddings + LLM in one package)

Architecture

LocalMode follows a "zero-dependency core, thin provider wrappers" architecture:

┌───────────────────────────────────────────────────────────────────────┐
│                          Your Application                             │
├───────────────────────────────────────────────────────────────────────┤
│                    @localmode/react  (46 hooks)                       │
├───────────────────────┬───────────────────────┬───────────────────────┤
│   @localmode/ai-sdk   │  @localmode/langchain │  @localmode/devtools  │
├───────────────────────┴───────────────────────┴───────────────────────┤
│                          @localmode/core                              │
│                                                                       │
│  VectorDB       Embeddings       Generation       Agents & Pipelines  │
│  (HNSW +        + Multimodal     + Structured     + Evaluation        │
│   WebGPU)                          Output         + Metrics           │
│                                                                       │
│  Security       Middleware        Import/Export    Model Cache        │
│  (DP, PII,      System                            + Registry          │
│   Crypto)                                                             │
├───────────────────────────────────────────────────────────────────────┤
│                        Provider Packages                              │
│                                                                       │
│  @localmode/transformers    HF Transformers.js       25 factories     │
│  @localmode/webllm          WebGPU                      30 models     │
│  @localmode/wllama          llama.cpp WASM           135K+ models     │
│  @localmode/chrome-ai       Gemini Nano             zero-download     │
├───────────────────────────────────────────────────────────────────────┤
│                          Browser APIs                                 │
│                                                                       │
│         WebGPU  ·  IndexedDB  ·  Web Workers  ·  Web Crypto           │
└───────────────────────────────────────────────────────────────────────┘

Demo Applications

See LocalMode in action at localmode.ai — 32 apps showcasing every feature.

Browser Compatibility

BrowserWebGPUWASMIndexedDBWorkersChrome AI
Chrome 138+YesYesYesYesYes
Edge 138+YesYesYesYesYes
Firefox 75+NightlyYesYesYesNo
Safari 18+YesYesYesPartialNo

Platform Notes

  • Chrome AI: Zero-download inference via Gemini Nano (fallback to Transformers.js)
  • WebGPU: 3-5x faster inference (fallback to WASM)
  • Safari/iOS: Private browsing blocks IndexedDB — use MemoryStorage fallback
  • Firefox: WebGPU only in Nightly — WASM fallback is automatic
  • SharedArrayBuffer: Requires cross-origin isolation for some features

Next Steps

On this page