LocalMode

Getting Started

Install LocalMode and build your first local AI application in minutes.

This guide walks you through installing LocalMode and building your first local-first AI application — from embeddings and semantic search to LLM chat and React hooks.

Installation

Install packages

The minimum setup requires @localmode/core and at least one provider:

bash pnpm install @localmode/core @localmode/transformers
bash npm install @localmode/core @localmode/transformers
bash yarn add @localmode/core @localmode/transformers
bash bun add @localmode/core @localmode/transformers

All underlying ML dependencies (like @huggingface/transformers) are automatically installed with the provider packages.

Optional packages

Add more capabilities as needed:

# LLM chat (pick one or more)
pnpm install @localmode/webllm          # WebGPU — fastest, 30 curated models
pnpm install @localmode/wllama          # WASM — 135K+ GGUF models, all browsers

# React hooks
pnpm install @localmode/react

# Ecosystem
pnpm install @localmode/ai-sdk          # Vercel AI SDK compatibility
pnpm install @localmode/langchain       # LangChain.js adapters
pnpm install @localmode/chrome-ai       # Chrome Built-in AI (Gemini Nano)
pnpm install @localmode/devtools        # In-app DevTools widget

# Storage adapters
pnpm install @localmode/dexie           # Dexie.js storage
pnpm install @localmode/pdfjs           # PDF text extraction

Configure bundler (if needed)

For Next.js, add to next.config.js:

next.config.js
/** @type {import('next').NextConfig} */
const nextConfig = {
  webpack: (config) => {
    config.resolve.alias = {
      ...config.resolve.alias,
      sharp$: false,
      'onnxruntime-node$': false,
    };
    return config;
  },
  experimental: {
    serverComponentsExternalPackages: ['sharp', 'onnxruntime-node'],
  },
};

module.exports = nextConfig;

For Vite, models work out of the box. For workers, you may need:

vite.config.ts
export default defineConfig({
  optimizeDeps: {
    exclude: ['@huggingface/transformers'],
  },
});

Your First Embedding

Let's create your first embedding:

first-embedding.ts
import { embed } from '@localmode/core';
import { transformers } from '@localmode/transformers';

// Create embedding model
const model = transformers.embedding('Xenova/bge-small-en-v1.5');

// Generate embedding
const { embedding, usage } = await embed({
  model,
  value: 'Hello, world!',
});

console.log('Embedding dimensions:', embedding.length); // 384
console.log('Tokens used:', usage.tokens);

First Load

The first time you use a model, it downloads from HuggingFace Hub and caches in IndexedDB. Subsequent loads are instant.

Build a Semantic Search App

Here's a complete example of building semantic search:

semantic-search.ts
import { createVectorDB, embed, embedMany, semanticSearch } from '@localmode/core';
import { transformers } from '@localmode/transformers';

// 1. Setup
const model = transformers.embedding('Xenova/bge-small-en-v1.5');
const db = await createVectorDB<{ text: string }>({
  name: 'my-documents',
  dimensions: 384,
});

// 2. Sample documents
const documents = [
  'Machine learning is a subset of artificial intelligence.',
  'Neural networks are inspired by biological neurons.',
  'Deep learning uses multiple layers of neural networks.',
  'Natural language processing handles human language.',
  'Computer vision enables machines to interpret images.',
];

// 3. Generate embeddings
const { embeddings } = await embedMany({
  model,
  values: documents,
});

// 4. Store in vector database
await db.addMany(
  documents.map((text, i) => ({
    id: `doc-${i}`,
    vector: embeddings[i],
    metadata: { text },
  }))
);

// 5. Search
const results = await semanticSearch({
  db,
  model,
  query: 'How do neural networks work?',
  k: 3,
});

console.log('Results:');
results.forEach((r, i) => {
  console.log(`${i + 1}. ${r.metadata.text} (score: ${r.score.toFixed(3)})`);
});

Output:

Results:
1. Neural networks are inspired by biological neurons. (score: 0.842)
2. Deep learning uses multiple layers of neural networks. (score: 0.756)
3. Machine learning is a subset of artificial intelligence. (score: 0.623)

Add RAG with Chunking

For longer documents, use chunking and reranking:

rag-example.ts
import { createVectorDB, chunk, ingest, semanticSearch, rerank } from '@localmode/core';
import { transformers } from '@localmode/transformers';

// Setup
const embeddingModel = transformers.embedding('Xenova/bge-small-en-v1.5');
const rerankerModel = transformers.reranker('Xenova/ms-marco-MiniLM-L-6-v2');

const db = await createVectorDB({
  name: 'documents',
  dimensions: 384,
});

// Load and chunk a document
const documentText = `
  Machine learning is revolutionizing how we build software...
  (your long document here)
`;

const chunks = chunk(documentText, {
  strategy: 'recursive',
  size: 512,
  overlap: 50,
});

// Ingest with automatic embedding
await ingest({
  db,
  model: embeddingModel,
  documents: chunks.map((c) => ({
    text: c.text,
    metadata: { start: c.startIndex, end: c.endIndex },
  })),
});

// Search and rerank for better accuracy
const query = 'What are the applications of machine learning?';

const searchResults = await semanticSearch({
  db,
  model: embeddingModel,
  query,
  k: 10, // Get more candidates for reranking
});

const { results: reranked } = await rerank({
  model: rerankerModel,
  query,
  documents: searchResults.map((r) => r.metadata.text as string),
  topK: 3,
});

console.log('Top results after reranking:');
reranked.forEach((r, i) => {
  console.log(`${i + 1}. Score: ${r.score.toFixed(3)}`);
  console.log(`   ${r.text.substring(0, 100)}...`);
});

Add LLM Chat

Three providers implement the same LanguageModel interface — choose based on your needs:

llm-chat.ts
import { streamText } from '@localmode/core';
import { webllm } from '@localmode/webllm';

// Pick any provider — all share the same LanguageModel interface
const model = webllm.languageModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');
// const model = wllama.languageModel('Llama-3.2-1B-Instruct-Q4_K_M');
// const model = transformers.languageModel('onnx-community/Qwen3-0.6B-ONNX');

const result = await streamText({
  model,
  prompt: 'Explain quantum computing in simple terms',
  maxTokens: 500,
});

for await (const chunk of result.stream) {
  process.stdout.write(chunk.text);
}

Combine with RAG results for grounded answers:

rag-with-llm.ts
// After getting search results from above...
const context = reranked.map((r) => r.text).join('\n\n');

const result = await streamText({
  model,
  prompt: `Based on the following context, answer the question.

Context:
${context}

Question: ${query}

Answer:`,
});

for await (const chunk of result.stream) {
  process.stdout.write(chunk.text);
}

Structured Output

Generate typed JSON objects with schema validation:

structured-output.ts
import { generateObject, jsonSchema } from '@localmode/core';
import { webllm } from '@localmode/webllm';
import { z } from 'zod';

const { object } = await generateObject({
  model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
  schema: jsonSchema(
    z.object({
      title: z.string(),
      summary: z.string(),
      tags: z.array(z.string()),
      sentiment: z.enum(['positive', 'negative', 'neutral']),
    })
  ),
  prompt: 'Analyze: "LocalMode makes AI accessible to everyone by running it in the browser"',
});

console.log(object.title);     // string
console.log(object.tags);      // string[]
console.log(object.sentiment); // 'positive' | 'negative' | 'neutral'

React Hooks

For React apps, @localmode/react provides hooks for every core function with built-in loading states, error handling, and cancellation:

ChatApp.tsx
import { useChat } from '@localmode/react';
import { webllm } from '@localmode/webllm';

function ChatApp() {
  const { messages, sendMessage, isStreaming, cancel } = useChat({
    model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
  });

  return (
    <div>
      {messages.map((msg) => (
        <div key={msg.id}>
          <strong>{msg.role}:</strong> {msg.content}
        </div>
      ))}
      <button onClick={() => sendMessage('Hello!')}>Send</button>
      {isStreaming && <button onClick={cancel}>Stop</button>}
    </div>
  );
}
SemanticSearch.tsx
import { useEmbed, useSemanticSearch } from '@localmode/react';
import { transformers } from '@localmode/transformers';

function SearchApp() {
  const model = transformers.embedding('Xenova/bge-small-en-v1.5');
  const { data, isLoading, execute } = useSemanticSearch({ model, db });

  return (
    <div>
      <input onChange={(e) => execute({ query: e.target.value, k: 5 })} />
      {isLoading && <p>Searching...</p>}
      {data?.map((r) => <p key={r.id}>{r.metadata.text}</p>)}
    </div>
  );
}

Project Structure

A typical LocalMode project might look like:

models.ts
db.ts
App.tsx
package.json
src/lib/models.ts
import { transformers } from '@localmode/transformers';
import { webllm } from '@localmode/webllm';

// Model instances (created once, reused everywhere)
export const embeddingModel = transformers.embedding('Xenova/bge-small-en-v1.5');
export const rerankerModel = transformers.reranker('Xenova/ms-marco-MiniLM-L-6-v2');
export const classifierModel = transformers.classifier('Xenova/distilbert-base-uncased-finetuned-sst-2-english');
export const llm = webllm.languageModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');
src/lib/db.ts
import { createVectorDB } from '@localmode/core';

let dbInstance: Awaited<ReturnType<typeof createVectorDB>> | null = null;

export async function getDB() {
  if (!dbInstance) {
    dbInstance = await createVectorDB({
      name: 'my-app',
      dimensions: 384,
    });
  }
  return dbInstance;
}

What You Can Build

Beyond embeddings, RAG, and LLM chat, LocalMode provides a full suite of local AI capabilities:

CategoryFunctionsUse Cases
Classificationclassify(), classifyZeroShot(), classifyMany()Sentiment analysis, email routing, content moderation
NERextractEntities()Document redaction, data extraction
Rerankingrerank()Improved RAG accuracy with cross-encoder scoring
VisioncaptionImage(), detectObjects(), segmentImage()Image captioning, object detection, background removal
Multimodal SearchembedImage(), embedManyImages()Cross-modal text-to-image search with CLIP
Audiotranscribe(), synthesizeSpeech()Voice notes, meeting transcription, audiobook creation
Translationtranslate()Multi-language translation (20+ languages)
Summarizationsummarize()Text and document summarization
Structured OutputgenerateObject(), streamObject()Typed JSON generation with Zod schema validation
AgentscreateAgent(), runAgent()ReAct loop with tool registry and VectorDB-backed memory
Document QAaskDocument(), askTable()Invoice Q&A, form and table understanding
OCRextractText()Document scanning, text extraction from images
Fill-MaskfillMask()Autocomplete, masked token prediction
Question AnsweringanswerQuestion()Extractive QA with confidence scores
Securityencrypt(), decrypt(), redactPII()Encrypted vaults, PII redaction, differential privacy
EvaluationevaluateModel(), accuracy(), bleuScore()Model quality metrics for classification, generation, retrieval
PipelinescreatePipeline()Composable multi-step workflows with 10 step types
Import/ExportimportFrom(), exportToCSV()Migrate vectors from Pinecone, ChromaDB, CSV, JSONL

All of these work offline after the initial model download, with no server or API key required.

Next Steps

On this page