LocalMode
Transformers

Text-to-Speech

Generate speech audio from text with Kokoro TTS — 29 English voices, phonemizer-backed.

Generate natural-sounding speech audio from text using Kokoro TTS running locally in the browser. With phonemizer-backed synthesis, 29 English voices (American & British), speed control, and streaming playback.

For full API reference (synthesizeSpeech(), options, result types, and custom providers), see the Core Audio guide.

See it in action

Try Audiobook Creator for long-form TTS or Voice Studio to browse and compare all 29 English voices.

ModelSizeQualityVoicesLanguages
onnx-community/Kokoro-82M-v1.0-ONNX~86MB (q8)High29English (US & GB)

Legacy models (generic pipeline, no phonemizer):

ModelSizeQualityNotes
Xenova/speecht5_tts~100MBBasicRequires separate vocoder
Xenova/mms-tts-eng~30MBMediumVITS model, smaller download

Voice Selection

Kokoro ships 29 English voices (American & British). Import the KOKORO_VOICES catalog for UI display:

import { transformers, KOKORO_VOICES, KOKORO_DEFAULT_VOICE } from '@localmode/transformers';
import { synthesizeSpeech } from '@localmode/core';

const model = transformers.textToSpeech('onnx-community/Kokoro-82M-v1.0-ONNX');

// List all voices
for (const voice of KOKORO_VOICES) {
  console.log(`${voice.id}: ${voice.name} (${voice.languageLabel}, ${voice.gender})`);
}

// Synthesize with a specific voice
const { audio } = await synthesizeSpeech({
  model,
  text: 'Hello from London!',
  voice: 'bf_emma', // British female "Emma"
  speed: 1.0,
});

Voice Naming Convention

Voice IDs follow the pattern [lang][gender]_[name]:

PrefixLanguageExample
af_ / am_American Englishaf_heart, am_michael
bf_ / bm_British Englishbf_emma, bm_george

Speed Control

Adjust synthesis speed from 0.5x (slow) to 2.0x (fast):

const { audio } = await synthesizeSpeech({
  model,
  text: 'This is fast speech.',
  voice: 'af_heart',
  speed: 1.5,
});

Streaming with streamSynthesizeSpeech

For real-time voice loops (e.g. an LLM reply being read aloud), use streamSynthesizeSpeech() — it splits the text into clauses and yields each clause's audio as soon as it finishes:

import { streamSynthesizeSpeech, playStreamedSpeech } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const kokoro = transformers.textToSpeech('onnx-community/Kokoro-82M-v1.0-ONNX');

function speak(text: string) {
  const ctx = new AudioContext();
  const stream = streamSynthesizeSpeech({
    model: kokoro,
    text,
    voice: 'af_heart',
    speed: 1.0,
  });
  return playStreamedSpeech(stream, ctx);
}

For full API reference (options, result types, AbortSignal, custom splitters), see the Core Streaming Speech guide.

Provider Options

Kokoro-specific options via providerOptions.kokoro:

OptionTypeDefaultDescription
dtype'q8' | 'fp16' | 'fp32' | 'q4' | 'q4f16''q8'Model quantization level
const { audio } = await synthesizeSpeech({
  model,
  text: 'High precision audio.',
  voice: 'af_heart',
  providerOptions: { kokoro: { dtype: 'fp16' } },
});

Best Practices

TTS Tips

  1. Use streaming for long textstreamSynthesizeSpeech() plays audio while still generating
  2. Pick the right voice — Browse all 29 in the Voice Studio
  3. Speed 1.0 is best quality — Extreme speeds (0.5 or 2.0) may reduce naturalness
  4. q8 is the sweet spot — 86MB download with no perceptible quality loss vs fp32 (326MB)

Showcase Apps

AppDescriptionLinks
Audiobook CreatorStreaming TTS with voice selection and speed controlDemo · Source
Voice StudioBrowse, preview, and compare all 29 Kokoro voicesDemo · Source

Next Steps

On this page