LocalMode
LangChain

Chat Model

ChatLocalMode — use any LocalMode language model with LangChain.

ChatLocalMode

Drop-in replacement for ChatOpenAI or any LangChain BaseChatModel, backed by local LLM inference.

See it in action

Try LangChain RAG for a working demo.

Constructor

import { ChatLocalMode } from '@localmode/langchain';
import { webllm } from '@localmode/webllm';

const llm = new ChatLocalMode({
  model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
  temperature: 0.7,
  maxTokens: 500,
  systemPrompt: 'You are a helpful assistant.',
});
OptionTypeRequiredDefaultDescription
modelLanguageModelYesAny @localmode/core LanguageModel instance
temperaturenumberNoDefault sampling temperature (0-2)
maxTokensnumberNoDefault max tokens to generate
systemPromptstringNoDefault system prompt

Basic Usage

invoke

const result = await llm.invoke('What is the capital of France?');
console.log(result.content); // "The capital of France is Paris."

With Messages

import { HumanMessage, SystemMessage, AIMessage } from '@langchain/core/messages';

const result = await llm.invoke([
  new SystemMessage('You are a helpful coding assistant.'),
  new HumanMessage('Write a hello world in Python'),
]);

Message Mapping

The adapter maps LangChain message types to LocalMode roles:

LangChain MessageLocalMode RoleNotes
HumanMessageuser
AIMessageassistant
SystemMessagesystemFirst SystemMessage extracted as systemPrompt

If the first message is a SystemMessage, its content is passed as the systemPrompt parameter to doGenerate() — it is not included in the messages array.

Streaming

ChatLocalMode supports streaming via _streamResponseChunks():

const stream = await llm.stream('Tell me a story');

for await (const chunk of stream) {
  process.stdout.write(chunk.content);
}

Streaming Behavior

  • If the wrapped LanguageModel has a doStream() method (e.g., WebLLM models), the adapter yields real ChatGenerationChunk objects as tokens arrive.
  • If doStream() is not available, the adapter falls back to _generate() and yields the full response as a single chunk.

Finish Reason

The ChatResult includes the model's finish reason:

const result = await llm._generate([new HumanMessage('hello')]);
console.log(result.llmOutput?.finishReason); // 'stop' | 'length'

Supported Models

Any LanguageModel from @localmode/webllm:

ModelSizeBest For
Llama-3.2-1B-Instruct-q4f16_1-MLC712MBFast, small tasks
Qwen2.5-1.5B-Instruct-q4f16_1-MLC1.0GBBalanced
Qwen3-1.7B-q4f16_1-MLC1.1GBBest quality (small)
Phi-3.5-mini-instruct-q4f16_1-MLC2.1GBReasoning tasks

Limitations: Local models do not support tool calling or structured output via LangChain's tool/schema APIs. Use ChatLocalMode for text generation only. For structured output, use generateObject() from @localmode/core directly.

Migration from OpenAI

- import { ChatOpenAI } from '@langchain/openai';
+ import { ChatLocalMode } from '@localmode/langchain';
+ import { webllm } from '@localmode/webllm';

- const llm = new ChatOpenAI({ modelName: 'gpt-4o-mini', temperature: 0.7 });
+ const llm = new ChatLocalMode({
+   model: webllm.languageModel('Qwen3-1.7B-q4f16_1-MLC'),
+   temperature: 0.7,
+ });

// Usage is identical:
const result = await llm.invoke('Hello');
const stream = await llm.stream('Tell me a story');

The _llmType() method returns 'localmode' for identification in LangChain logging and callbacks.

Showcase Apps

AppDescriptionLinks
LangChain RAGChat interface using ChatLocalMode adapterDemo · Source

On this page