Text-to-Speech
Generate speech audio from text with Kokoro TTS — 29 English voices, phonemizer-backed.
Generate natural-sounding speech audio from text using Kokoro TTS running locally in the browser. With phonemizer-backed synthesis, 29 English voices (American & British), speed control, and streaming playback.
For full API reference (synthesizeSpeech(), options, result types, and custom providers), see the Core Audio guide.
See it in action
Try Audiobook Creator for long-form TTS or Voice Studio to browse and compare all 29 English voices.
Recommended Models
| Model | Size | Quality | Voices | Languages |
|---|---|---|---|---|
onnx-community/Kokoro-82M-v1.0-ONNX | ~86MB (q8) | High | 29 | English (US & GB) |
Legacy models (generic pipeline, no phonemizer):
| Model | Size | Quality | Notes |
|---|---|---|---|
Xenova/speecht5_tts | ~100MB | Basic | Requires separate vocoder |
Xenova/mms-tts-eng | ~30MB | Medium | VITS model, smaller download |
Voice Selection
Kokoro ships 29 English voices (American & British). Import the KOKORO_VOICES catalog for UI display:
import { transformers, KOKORO_VOICES, KOKORO_DEFAULT_VOICE } from '@localmode/transformers';
import { synthesizeSpeech } from '@localmode/core';
const model = transformers.textToSpeech('onnx-community/Kokoro-82M-v1.0-ONNX');
// List all voices
for (const voice of KOKORO_VOICES) {
console.log(`${voice.id}: ${voice.name} (${voice.languageLabel}, ${voice.gender})`);
}
// Synthesize with a specific voice
const { audio } = await synthesizeSpeech({
model,
text: 'Hello from London!',
voice: 'bf_emma', // British female "Emma"
speed: 1.0,
});Voice Naming Convention
Voice IDs follow the pattern [lang][gender]_[name]:
| Prefix | Language | Example |
|---|---|---|
af_ / am_ | American English | af_heart, am_michael |
bf_ / bm_ | British English | bf_emma, bm_george |
Speed Control
Adjust synthesis speed from 0.5x (slow) to 2.0x (fast):
const { audio } = await synthesizeSpeech({
model,
text: 'This is fast speech.',
voice: 'af_heart',
speed: 1.5,
});Streaming with streamSynthesizeSpeech
For real-time voice loops (e.g. an LLM reply being read aloud), use
streamSynthesizeSpeech() — it splits the
text into clauses and yields each clause's audio as soon as it finishes:
import { streamSynthesizeSpeech, playStreamedSpeech } from '@localmode/core';
import { transformers } from '@localmode/transformers';
const kokoro = transformers.textToSpeech('onnx-community/Kokoro-82M-v1.0-ONNX');
function speak(text: string) {
const ctx = new AudioContext();
const stream = streamSynthesizeSpeech({
model: kokoro,
text,
voice: 'af_heart',
speed: 1.0,
});
return playStreamedSpeech(stream, ctx);
}For full API reference (options, result types, AbortSignal, custom splitters), see the Core Streaming Speech guide.
Provider Options
Kokoro-specific options via providerOptions.kokoro:
| Option | Type | Default | Description |
|---|---|---|---|
dtype | 'q8' | 'fp16' | 'fp32' | 'q4' | 'q4f16' | 'q8' | Model quantization level |
const { audio } = await synthesizeSpeech({
model,
text: 'High precision audio.',
voice: 'af_heart',
providerOptions: { kokoro: { dtype: 'fp16' } },
});Best Practices
TTS Tips
- Use streaming for long text —
streamSynthesizeSpeech()plays audio while still generating - Pick the right voice — Browse all 29 in the Voice Studio
- Speed 1.0 is best quality — Extreme speeds (0.5 or 2.0) may reduce naturalness
- q8 is the sweet spot — 86MB download with no perceptible quality loss vs fp32 (326MB)
Showcase Apps
| App | Description | Links |
|---|---|---|
| Audiobook Creator | Streaming TTS with voice selection and speed control | Demo · Source |
| Voice Studio | Browse, preview, and compare all 29 Kokoro voices | Demo · Source |