Local-First AI for the Web

LocalMode

Run ML models entirely in your browser. Embeddings, vector search, LLM chat, vision, audio, agents, and structured output - all offline, all private.
No servers. No API keys. Your data never leaves your device.

Built for the Modern Web

AI in the Browser

Run embeddings, LLMs, classification, vision, audio, and agents directly in the browser with WebGPU and WASM.

Privacy-First

Zero telemetry. No data leaves your device. Built-in encryption, PII redaction, and differential privacy.

Zero-Dependency Core

Core package has no external dependencies. Built entirely on native Web APIs.

Offline-Ready

Models cached in IndexedDB. Works without internet after initial download. Automatic fallbacks.

Interoperable

Vercel AI SDK patterns. LangChain.js adapters. Import vectors from Pinecone and ChromaDB.

Device-Aware

Adaptive batching, model recommendations, and WebGPU acceleration based on device capabilities.

13 Packages

Modular architecture - use only what you need. Zero-dependency core provides everything; providers add ML framework integrations.

Storage Adapters

Simple, Powerful API

Function-first design with TypeScript. All operations return structured results.

Embeddings & Vector Search

Terminal
$ pnpm install @localmode/core @localmode/transformers
embeddings.ts
import { createVectorDB, embed, embedMany, chunk } from '@localmode/core';
import { transformers } from '@localmode/transformers';

// Create embedding model
const model = transformers.embedding('Xenova/bge-small-en-v1.5');

// Create vector database with typed metadata
const db = await createVectorDB<{ text: string }>({
  name: 'docs',
  dimensions: 384,
});

// Chunk and embed documents
const chunks = chunk(documentText, { size: 512, overlap: 50 });
const { embeddings } = await embedMany({
  model,
  values: chunks.map((c) => c.text),
});

// Store vectors
await db.addMany(
  chunks.map((c, i) => ({
    id: `chunk-${i}`,
    vector: embeddings[i],
    metadata: { text: c.text },
  }))
);

// Search
const { embedding: query } = await embed({ model, value: 'What is AI?' });
const results = await db.search(query, { k: 5 });

LLM Chat & Structured Output

Terminal
$ pnpm install @localmode/core @localmode/webllm
chat.ts
import { streamText, generateObject, jsonSchema } from '@localmode/core';
import { webllm } from '@localmode/webllm';
import { z } from 'zod';

// Stream text from a local LLM
const model = webllm.languageModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');

const result = await streamText({
  model,
  prompt: 'Explain quantum computing simply',
  maxTokens: 500,
});

for await (const chunk of result.stream) {
  process.stdout.write(chunk.text);
}

// Structured output with Zod schema
const { object } = await generateObject({
  model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
  schema: jsonSchema(
    z.object({
      name: z.string(),
      age: z.number(),
      interests: z.array(z.string()),
    })
  ),
  prompt: 'Generate a profile for a software engineer named Alex',
});

3 LLM Providers, 1 Interface

All providers implement the same LanguageModel interface - swap with a single line change.

WebLLMWllamaTransformers.js
RuntimeWebGPUWASM (llama.cpp)ONNX Runtime
Models30 curated (MLC)135K+ GGUF from HF14 curated ONNX (TJS v4)
SpeedFastest (GPU)Good (CPU)Good (CPU/GPU)
Browser SupportChrome/Edge 113+All modern browsersAll modern browsers
Best ForMaximum performanceUniversal compatibilityMulti-task (embed + LLM)

Blog

Guides, tutorials, and deep dives on local-first AI, browser ML, RAG patterns, privacy-preserving inference, and more.

Read the Blog

Ready to Build?

Start building local-first AI applications with comprehensive documentation, 32 example apps, and guides for every feature.