LocalMode
LangChain

Migration Guide

Migrate existing LangChain.js apps from cloud providers to local inference.

Migration Guide

Migrate existing LangChain.js applications from cloud providers to 100% local inference. Only provider instantiation changes โ€” all chain, retriever, and agent code stays the same.

See it in action

Try LangChain RAG for a working demo of a fully migrated RAG pipeline.

Component Mapping

ComponentCloud (Before)Local (After)
EmbeddingsOpenAIEmbeddingsLocalModeEmbeddings
Chat ModelChatOpenAIChatLocalMode
Vector StorePineconeStore / ChromaStoreLocalModeVectorStore
RerankerCohereRerankLocalModeReranker

Full RAG Chain Migration

Before โ€” Cloud

import { ChatOpenAI } from '@langchain/openai';
import { OpenAIEmbeddings } from '@langchain/openai';
import { PineconeStore } from '@langchain/pinecone';
import { Pinecone } from '@pinecone-database/pinecone';

// Requires: OPENAI_API_KEY, PINECONE_API_KEY
const llm = new ChatOpenAI({ modelName: 'gpt-4o-mini', temperature: 0.7 });
const embeddings = new OpenAIEmbeddings({ modelName: 'text-embedding-3-small' });

const pinecone = new Pinecone();
const index = pinecone.Index('my-docs');
const store = await PineconeStore.fromExistingIndex(embeddings, { pineconeIndex: index });

const retriever = store.asRetriever({ k: 5 });

After โ€” Local

import { ChatLocalMode, LocalModeEmbeddings, LocalModeVectorStore } from '@localmode/langchain';
import { transformers } from '@localmode/transformers';
import { webllm } from '@localmode/webllm';
import { createVectorDB } from '@localmode/core';

// No API keys needed
const llm = new ChatLocalMode({
  model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
  temperature: 0.7,
});
const embeddings = new LocalModeEmbeddings({
  model: transformers.embedding('Xenova/bge-small-en-v1.5'),
});

const db = await createVectorDB({ name: 'my-docs', dimensions: 384 });
const store = new LocalModeVectorStore(embeddings, { db });

const retriever = store.asRetriever({ k: 5 });

Everything after retriever โ€” chains, prompts, output parsers โ€” stays identical.

Individual Adapter Migrations

Embeddings

- import { OpenAIEmbeddings } from '@langchain/openai';
+ import { LocalModeEmbeddings } from '@localmode/langchain';
+ import { transformers } from '@localmode/transformers';

- const embeddings = new OpenAIEmbeddings({ modelName: 'text-embedding-3-small' });
+ const embeddings = new LocalModeEmbeddings({
+   model: transformers.embedding('Xenova/bge-small-en-v1.5'),
+ });

Chat Model

- import { ChatOpenAI } from '@langchain/openai';
+ import { ChatLocalMode } from '@localmode/langchain';
+ import { webllm } from '@localmode/webllm';

- const llm = new ChatOpenAI({ modelName: 'gpt-4o-mini', temperature: 0.7 });
+ const llm = new ChatLocalMode({
+   model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
+   temperature: 0.7,
+ });

Vector Store

- import { PineconeStore } from '@langchain/pinecone';
+ import { LocalModeVectorStore } from '@localmode/langchain';
+ import { createVectorDB } from '@localmode/core';

- const store = await PineconeStore.fromExistingIndex(embeddings, { pineconeIndex });
+ const db = await createVectorDB({ name: 'docs', dimensions: 384 });
+ const store = new LocalModeVectorStore(embeddings, { db });

Reranker

- import { CohereRerank } from '@langchain/cohere';
+ import { LocalModeReranker } from '@localmode/langchain';
+ import { transformers } from '@localmode/transformers';

- const reranker = new CohereRerank({ model: 'rerank-english-v3.0', topN: 5 });
+ const reranker = new LocalModeReranker({
+   model: transformers.reranker('Xenova/ms-marco-MiniLM-L-6-v2'),
+   topK: 5,
+ });

The LocalModeReranker implements BaseDocumentCompressor, so it works with LangChain's ContextualCompressionRetriever:

import { ContextualCompressionRetriever } from 'langchain/retrievers/contextual_compression';

const compressedRetriever = new ContextualCompressionRetriever({
  baseCompressor: reranker,
  baseRetriever: store.asRetriever({ k: 20 }),
});

// Returns top 5 most relevant docs from the initial 20
const results = await compressedRetriever.invoke('search query');

What Changes, What Doesn't

AspectChanges?Details
Chain/agent codeNoSame LangChain chains, retrievers, prompts
Provider importsYes3 import line changes
API keysRemovedNo OPENAI_API_KEY, PINECONE_API_KEY needed
Monthly cost$0All inference is local, unlimited usage
Data privacyImprovedDocuments and embeddings never leave the device
First-run timeSlowerOne-time model download (33MB-2GB depending on models)
Model qualityDifferentSmaller models โ€” excellent for embeddings/reranking, adequate for simple generation
StreamingSupportedChatLocalMode streams via doStream() when available
Tool callingNot supportedLocal models don't support LangChain's tool/function calling
Structured outputNot supportedUse generateObject() from @localmode/core directly instead

Hybrid approach: Use LocalModeEmbeddings + LocalModeVectorStore + LocalModeReranker for the retrieval pipeline (runs locally at $0 cost), and keep ChatOpenAI for the generation step when you need frontier model quality. This gives you 90% cost savings while maintaining answer quality.

Showcase Apps

AppDescriptionLinks
LangChain RAGEnd-to-end RAG app demonstrating migration from cloud to local providersDemo ยท Source

On this page