Getting Started
Install LocalMode and build your first local AI application in minutes.
This guide walks you through installing LocalMode and building your first local-first AI application — from embeddings and semantic search to LLM chat and React hooks.
Installation
Install packages
The minimum setup requires @localmode/core and at least one provider:
bash pnpm install @localmode/core @localmode/transformers bash npm install @localmode/core @localmode/transformers bash yarn add @localmode/core @localmode/transformers bash bun add @localmode/core @localmode/transformers All underlying ML dependencies (like @huggingface/transformers) are automatically installed with the provider packages.
Optional packages
Add more capabilities as needed:
# LLM chat (pick one or more)
pnpm install @localmode/webllm # WebGPU — fastest, 30 curated models
pnpm install @localmode/wllama # WASM — 135K+ GGUF models, all browsers
# React hooks
pnpm install @localmode/react
# Ecosystem
pnpm install @localmode/ai-sdk # Vercel AI SDK compatibility
pnpm install @localmode/langchain # LangChain.js adapters
pnpm install @localmode/chrome-ai # Chrome Built-in AI (Gemini Nano)
pnpm install @localmode/devtools # In-app DevTools widget
# Storage adapters
pnpm install @localmode/dexie # Dexie.js storage
pnpm install @localmode/pdfjs # PDF text extractionConfigure bundler (if needed)
For Next.js, add to next.config.js:
/** @type {import('next').NextConfig} */
const nextConfig = {
webpack: (config) => {
config.resolve.alias = {
...config.resolve.alias,
sharp$: false,
'onnxruntime-node$': false,
};
return config;
},
experimental: {
serverComponentsExternalPackages: ['sharp', 'onnxruntime-node'],
},
};
module.exports = nextConfig;For Vite, models work out of the box. For workers, you may need:
export default defineConfig({
optimizeDeps: {
exclude: ['@huggingface/transformers'],
},
});Your First Embedding
Let's create your first embedding:
import { embed } from '@localmode/core';
import { transformers } from '@localmode/transformers';
// Create embedding model
const model = transformers.embedding('Xenova/bge-small-en-v1.5');
// Generate embedding
const { embedding, usage } = await embed({
model,
value: 'Hello, world!',
});
console.log('Embedding dimensions:', embedding.length); // 384
console.log('Tokens used:', usage.tokens);First Load
The first time you use a model, it downloads from HuggingFace Hub and caches in IndexedDB. Subsequent loads are instant.
Build a Semantic Search App
Here's a complete example of building semantic search:
import { createVectorDB, embed, embedMany, semanticSearch } from '@localmode/core';
import { transformers } from '@localmode/transformers';
// 1. Setup
const model = transformers.embedding('Xenova/bge-small-en-v1.5');
const db = await createVectorDB<{ text: string }>({
name: 'my-documents',
dimensions: 384,
});
// 2. Sample documents
const documents = [
'Machine learning is a subset of artificial intelligence.',
'Neural networks are inspired by biological neurons.',
'Deep learning uses multiple layers of neural networks.',
'Natural language processing handles human language.',
'Computer vision enables machines to interpret images.',
];
// 3. Generate embeddings
const { embeddings } = await embedMany({
model,
values: documents,
});
// 4. Store in vector database
await db.addMany(
documents.map((text, i) => ({
id: `doc-${i}`,
vector: embeddings[i],
metadata: { text },
}))
);
// 5. Search
const results = await semanticSearch({
db,
model,
query: 'How do neural networks work?',
k: 3,
});
console.log('Results:');
results.forEach((r, i) => {
console.log(`${i + 1}. ${r.metadata.text} (score: ${r.score.toFixed(3)})`);
});Output:
Results:
1. Neural networks are inspired by biological neurons. (score: 0.842)
2. Deep learning uses multiple layers of neural networks. (score: 0.756)
3. Machine learning is a subset of artificial intelligence. (score: 0.623)Add RAG with Chunking
For longer documents, use chunking and reranking:
import { createVectorDB, chunk, ingest, semanticSearch, rerank } from '@localmode/core';
import { transformers } from '@localmode/transformers';
// Setup
const embeddingModel = transformers.embedding('Xenova/bge-small-en-v1.5');
const rerankerModel = transformers.reranker('Xenova/ms-marco-MiniLM-L-6-v2');
const db = await createVectorDB({
name: 'documents',
dimensions: 384,
});
// Load and chunk a document
const documentText = `
Machine learning is revolutionizing how we build software...
(your long document here)
`;
const chunks = chunk(documentText, {
strategy: 'recursive',
size: 512,
overlap: 50,
});
// Ingest with automatic embedding
await ingest({
db,
model: embeddingModel,
documents: chunks.map((c) => ({
text: c.text,
metadata: { start: c.startIndex, end: c.endIndex },
})),
});
// Search and rerank for better accuracy
const query = 'What are the applications of machine learning?';
const searchResults = await semanticSearch({
db,
model: embeddingModel,
query,
k: 10, // Get more candidates for reranking
});
const { results: reranked } = await rerank({
model: rerankerModel,
query,
documents: searchResults.map((r) => r.metadata.text as string),
topK: 3,
});
console.log('Top results after reranking:');
reranked.forEach((r, i) => {
console.log(`${i + 1}. Score: ${r.score.toFixed(3)}`);
console.log(` ${r.text.substring(0, 100)}...`);
});Add LLM Chat
Three providers implement the same LanguageModel interface — choose based on your needs:
import { streamText } from '@localmode/core';
import { webllm } from '@localmode/webllm';
// Pick any provider — all share the same LanguageModel interface
const model = webllm.languageModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');
// const model = wllama.languageModel('Llama-3.2-1B-Instruct-Q4_K_M');
// const model = transformers.languageModel('onnx-community/Qwen3-0.6B-ONNX');
const result = await streamText({
model,
prompt: 'Explain quantum computing in simple terms',
maxTokens: 500,
});
for await (const chunk of result.stream) {
process.stdout.write(chunk.text);
}Combine with RAG results for grounded answers:
// After getting search results from above...
const context = reranked.map((r) => r.text).join('\n\n');
const result = await streamText({
model,
prompt: `Based on the following context, answer the question.
Context:
${context}
Question: ${query}
Answer:`,
});
for await (const chunk of result.stream) {
process.stdout.write(chunk.text);
}Structured Output
Generate typed JSON objects with schema validation:
import { generateObject, jsonSchema } from '@localmode/core';
import { webllm } from '@localmode/webllm';
import { z } from 'zod';
const { object } = await generateObject({
model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
schema: jsonSchema(
z.object({
title: z.string(),
summary: z.string(),
tags: z.array(z.string()),
sentiment: z.enum(['positive', 'negative', 'neutral']),
})
),
prompt: 'Analyze: "LocalMode makes AI accessible to everyone by running it in the browser"',
});
console.log(object.title); // string
console.log(object.tags); // string[]
console.log(object.sentiment); // 'positive' | 'negative' | 'neutral'React Hooks
For React apps, @localmode/react provides hooks for every core function with built-in loading states, error handling, and cancellation:
import { useChat } from '@localmode/react';
import { webllm } from '@localmode/webllm';
function ChatApp() {
const { messages, sendMessage, isStreaming, cancel } = useChat({
model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
});
return (
<div>
{messages.map((msg) => (
<div key={msg.id}>
<strong>{msg.role}:</strong> {msg.content}
</div>
))}
<button onClick={() => sendMessage('Hello!')}>Send</button>
{isStreaming && <button onClick={cancel}>Stop</button>}
</div>
);
}import { useEmbed, useSemanticSearch } from '@localmode/react';
import { transformers } from '@localmode/transformers';
function SearchApp() {
const model = transformers.embedding('Xenova/bge-small-en-v1.5');
const { data, isLoading, execute } = useSemanticSearch({ model, db });
return (
<div>
<input onChange={(e) => execute({ query: e.target.value, k: 5 })} />
{isLoading && <p>Searching...</p>}
{data?.map((r) => <p key={r.id}>{r.metadata.text}</p>)}
</div>
);
}Project Structure
A typical LocalMode project might look like:
import { transformers } from '@localmode/transformers';
import { webllm } from '@localmode/webllm';
// Model instances (created once, reused everywhere)
export const embeddingModel = transformers.embedding('Xenova/bge-small-en-v1.5');
export const rerankerModel = transformers.reranker('Xenova/ms-marco-MiniLM-L-6-v2');
export const classifierModel = transformers.classifier('Xenova/distilbert-base-uncased-finetuned-sst-2-english');
export const llm = webllm.languageModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');import { createVectorDB } from '@localmode/core';
let dbInstance: Awaited<ReturnType<typeof createVectorDB>> | null = null;
export async function getDB() {
if (!dbInstance) {
dbInstance = await createVectorDB({
name: 'my-app',
dimensions: 384,
});
}
return dbInstance;
}What You Can Build
Beyond embeddings, RAG, and LLM chat, LocalMode provides a full suite of local AI capabilities:
| Category | Functions | Use Cases |
|---|---|---|
| Classification | classify(), classifyZeroShot(), classifyMany() | Sentiment analysis, email routing, content moderation |
| NER | extractEntities() | Document redaction, data extraction |
| Reranking | rerank() | Improved RAG accuracy with cross-encoder scoring |
| Vision | captionImage(), detectObjects(), segmentImage() | Image captioning, object detection, background removal |
| Multimodal Search | embedImage(), embedManyImages() | Cross-modal text-to-image search with CLIP |
| Audio | transcribe(), synthesizeSpeech() | Voice notes, meeting transcription, audiobook creation |
| Translation | translate() | Multi-language translation (20+ languages) |
| Summarization | summarize() | Text and document summarization |
| Structured Output | generateObject(), streamObject() | Typed JSON generation with Zod schema validation |
| Agents | createAgent(), runAgent() | ReAct loop with tool registry and VectorDB-backed memory |
| Document QA | askDocument(), askTable() | Invoice Q&A, form and table understanding |
| OCR | extractText() | Document scanning, text extraction from images |
| Fill-Mask | fillMask() | Autocomplete, masked token prediction |
| Question Answering | answerQuestion() | Extractive QA with confidence scores |
| Security | encrypt(), decrypt(), redactPII() | Encrypted vaults, PII redaction, differential privacy |
| Evaluation | evaluateModel(), accuracy(), bleuScore() | Model quality metrics for classification, generation, retrieval |
| Pipelines | createPipeline() | Composable multi-step workflows with 10 step types |
| Import/Export | importFrom(), exportToCSV() | Migrate vectors from Pinecone, ChromaDB, CSV, JSONL |
All of these work offline after the initial model download, with no server or API key required.
Next Steps
Core Package
VectorDB, embeddings, RAG, agents, evaluation, middleware, and security.
React Hooks
46 hooks for chat, embeddings, classification, vision, audio, and more.
Transformers Provider
25 model factories for text, vision, audio, and multimodal tasks.
WebLLM Provider
30 curated WebGPU models for fastest LLM inference.
Wllama Provider
135K+ GGUF models via llama.cpp WASM — universal browser support.
Chrome AI Provider
Zero-download inference via Chrome's built-in Gemini Nano.