LocalMode
LangChain

LangChain

Drop-in local inference for existing LangChain.js applications.

LangChain Integration

@localmode/langchain provides adapter classes so existing LangChain.js applications can swap cloud providers for 100% local inference — by changing just 3 imports.

See it in action

Try LangChain RAG for a working demo.

Adapters

LangChain Base ClassLocalMode AdapterWrapsProvider
EmbeddingsLocalModeEmbeddingsEmbeddingModel@localmode/transformers
BaseChatModelChatLocalModeLanguageModel@localmode/webllm
VectorStoreLocalModeVectorStoreVectorDB@localmode/core
BaseDocumentCompressorLocalModeRerankerRerankerModel@localmode/transformers

Each adapter is a thin, stateless wrapper — it converts between LangChain and LocalMode data formats and delegates all work to the underlying model or database.

Installation

pnpm install @localmode/langchain @localmode/core @localmode/transformers
# Add webllm if using ChatLocalMode:
pnpm install @localmode/webllm

Quick Start — Full RAG Chain

import { LocalModeEmbeddings, ChatLocalMode, LocalModeVectorStore } from '@localmode/langchain';
import { transformers } from '@localmode/transformers';
import { webllm } from '@localmode/webllm';
import { createVectorDB } from '@localmode/core';

// 1. Create local models
const embeddingModel = transformers.embedding('Xenova/bge-small-en-v1.5');
const llmModel = webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC');

// 2. Wrap in LangChain adapters
const embeddings = new LocalModeEmbeddings({ model: embeddingModel });
const llm = new ChatLocalMode({ model: llmModel });

// 3. Create vector store backed by local IndexedDB
const db = await createVectorDB({ name: 'docs', dimensions: 384 });
const store = new LocalModeVectorStore(embeddings, { db });

// 4. Add documents (embeds automatically)
await store.addDocuments([
  { pageContent: 'LocalMode runs AI in the browser.', metadata: { source: 'docs' } },
  { pageContent: 'Data never leaves the device.', metadata: { source: 'docs' } },
]);

// 5. Search
const results = await store.similaritySearch('privacy', 3);

// 6. Generate with context
const context = results.map((r) => r.pageContent).join('\n');
const answer = await llm.invoke(`Based on: ${context}\n\nQuestion: How does LocalMode handle privacy?`);

Everything runs locally. No API keys, no servers, no data leaves the device.

Key Design Decisions

  • User provides model instances — You create the LocalMode model and pass it to the adapter. The adapter doesn't know about provider packages.
  • Float32Array to number[] — LangChain uses number[][] for embeddings. The adapter converts automatically via Array.from().
  • Streaming fallbackChatLocalMode._stream() uses the model's doStream() if available, otherwise falls back to generating the full response and yielding it as a single chunk.
  • No tool calling — Local models have limited tool-calling ability. ChatLocalMode returns text-only content.

Package Details

PropertyValue
Package@localmode/langchain
Dependencies@langchain/core (>=0.3.0)
Peer Dependencies@localmode/core (>=1.0.0)
BundleESM + CJS, tree-shakeable
Side EffectsNone

For individual adapter docs, see: Embeddings, Chat Model, Vector Store, Migration Guide.

Showcase Apps

AppDescriptionLinks
LangChain RAGEnd-to-end RAG app using LangChain adaptersDemo · Source

On this page