Core
Text Generation
Generate and stream text with language models.
Generate text using local language models with streaming support.
streamText()
Stream text generation for real-time responses:
import { streamText } from '@localmode/core';
import { webllm } from '@localmode/webllm';
const model = webllm.languageModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');
const stream = await streamText({
model,
prompt: 'Explain quantum computing in simple terms.',
});
for await (const chunk of stream) {
process.stdout.write(chunk.text);
}With System Prompt
const stream = await streamText({
model,
system: 'You are a helpful coding assistant. Be concise.',
prompt: 'Write a function to reverse a string in TypeScript.',
});Options
interface StreamTextOptions {
model: LanguageModel;
prompt: string;
system?: string;
maxTokens?: number;
temperature?: number;
topP?: number;
stopSequences?: string[];
abortSignal?: AbortSignal;
}Stream Properties
const stream = await streamText({ model, prompt: 'Hello' });
// Iterate over text chunks
for await (const chunk of stream) {
console.log(chunk.text); // The generated text piece
console.log(chunk.isLast); // Whether this is the last chunk
}
// Get full text after streaming
const fullText = await stream.text;
// Get usage statistics
const usage = await stream.usage;
console.log('Tokens:', usage.totalTokens);generateText()
Generate complete text without streaming:
import { generateText } from '@localmode/core';
const { text, usage } = await generateText({
model,
prompt: 'Write a haiku about programming.',
});
console.log(text);
console.log('Tokens used:', usage.totalTokens);Options
interface GenerateTextOptions {
model: LanguageModel;
prompt: string;
system?: string;
maxTokens?: number;
temperature?: number;
topP?: number;
stopSequences?: string[];
abortSignal?: AbortSignal;
}Return Value
interface GenerateTextResult {
text: string;
usage: {
promptTokens: number;
completionTokens: number;
totalTokens: number;
};
response: {
modelId: string;
timestamp: Date;
};
}Cancellation
Cancel generation mid-stream:
const controller = new AbortController();
// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000);
try {
const stream = await streamText({
model,
prompt: 'Write a long essay...',
abortSignal: controller.signal,
});
for await (const chunk of stream) {
process.stdout.write(chunk.text);
}
} catch (error) {
if (error.name === 'AbortError') {
console.log('\nGeneration cancelled');
}
}Temperature & Sampling
Control randomness in generation:
// More deterministic (good for factual responses)
const stream = await streamText({
model,
prompt: 'What is 2 + 2?',
temperature: 0.1,
});
// More creative (good for stories, brainstorming)
const stream = await streamText({
model,
prompt: 'Write a creative story about a robot.',
temperature: 0.9,
});
// Nucleus sampling
const stream = await streamText({
model,
prompt: 'Continue this sentence: The future of AI is...',
topP: 0.9, // Consider tokens making up 90% of probability
});| Parameter | Description | Range | Default |
|---|---|---|---|
temperature | Randomness | 0.0 - 2.0 | 1.0 |
topP | Nucleus sampling | 0.0 - 1.0 | 1.0 |
maxTokens | Max generation length | 1 - model max | Model default |
Stop Sequences
Stop generation at specific patterns:
const stream = await streamText({
model,
prompt: 'List three fruits:\n1.',
stopSequences: ['\n4.', '\n\n'], // Stop before 4th item or double newline
});Chat-Style Prompts
Build chat applications:
function buildPrompt(messages: Array<{ role: string; content: string }>) {
return messages
.map((m) => `${m.role}: ${m.content}`)
.join('\n') + '\nassistant:';
}
const messages = [
{ role: 'user', content: 'Hello!' },
{ role: 'assistant', content: 'Hi! How can I help you today?' },
{ role: 'user', content: 'What is TypeScript?' },
];
const stream = await streamText({
model,
system: 'You are a helpful programming assistant.',
prompt: buildPrompt(messages),
stopSequences: ['user:', '\n\n'],
});RAG Integration
Combine with retrieval:
import { semanticSearch, streamText } from '@localmode/core';
async function ragQuery(question: string) {
// Retrieve context
const results = await semanticSearch({ db, model: embeddingModel, query: question, k: 3 });
const context = results.map((r) => r.metadata.text).join('\n\n');
// Generate answer
const stream = await streamText({
model: llm,
system: 'Answer based only on the provided context.',
prompt: `Context:\n${context}\n\nQuestion: ${question}\n\nAnswer:`,
});
return stream;
}Implementing Custom Models
Create your own language model:
import type { LanguageModel, GenerateTextOptions, StreamTextOptions } from '@localmode/core';
class MyLanguageModel implements LanguageModel {
readonly modelId = 'custom:my-model';
readonly provider = 'custom';
async doGenerateText(options: GenerateTextOptions) {
// Your generation logic
return {
text: 'Generated text...',
usage: { promptTokens: 10, completionTokens: 20, totalTokens: 30 },
};
}
async doStreamText(options: StreamTextOptions) {
// Return an async generator
return (async function* () {
yield { text: 'Hello', isLast: false };
yield { text: ' world!', isLast: true };
})();
}
}Best Practices
Generation Tips
- Stream for UX — Always use
streamText()for user-facing apps - Set max tokens — Prevent runaway generation
- Use system prompts — Guide model behavior consistently
- Handle errors — Wrap generation in try-catch
- Provide cancellation — Let users abort long generations