Text-to-Speech

Generate natural-sounding speech audio from text using Kokoro TTS models running locally in the browser.

For full API reference (synthesizeSpeech(), options, result types, and custom providers), see the Core Audio guide.

See it in action

Try Audiobook Creator for a working demo.

Recommended Models

Model	Size	Quality	Use Case
`onnx-community/Kokoro-82M-v1.0-ONNX`	~82MB	High	General-purpose English TTS, natural prosody

Audiobook Creator Example

Based on the Audiobook Creator showcase app:

import { transformers } from '@localmode/transformers';
import { synthesizeSpeech } from '@localmode/core';

const model = transformers.textToSpeech('onnx-community/Kokoro-82M-v1.0-ONNX');

async function synthesizeChapter(text: string) {
  // Kokoro works best with shorter segments
  const maxLength = 5000;
  const segments = splitText(text, maxLength);
  const audioBlobs: Blob[] = [];

  for (const segment of segments) {
    const { audio } = await synthesizeSpeech({
      model,
      text: segment,
      abortSignal: controller.signal,
    });
    audioBlobs.push(audio);
  }

  // Combine audio blobs
  return new Blob(audioBlobs, { type: 'audio/wav' });
}

Best Practices

TTS Tips

Keep text segments short — Kokoro works best with text under 5000 characters
Use WAV output — The audio is returned as a WAV blob, ready for playback
Split long text — For books or articles, split into paragraphs and synthesize each
Support cancellation — TTS can take several seconds per segment

Limitations

Kokoro currently supports English only. Multiple voice styles are available via the model's built-in speaker options.