LocalMode
Transformers

Text-to-Speech

Generate speech audio from text with Kokoro TTS.

Generate natural-sounding speech audio from text using Kokoro TTS models running locally in the browser.

For full API reference (synthesizeSpeech(), options, result types, and custom providers), see the Core Audio guide.

See it in action

Try Audiobook Creator for a working demo.

ModelSizeQualityUse Case
onnx-community/Kokoro-82M-v1.0-ONNX~82MBHighGeneral-purpose English TTS, natural prosody

Audiobook Creator Example

Based on the Audiobook Creator showcase app:

import { transformers } from '@localmode/transformers';
import { synthesizeSpeech } from '@localmode/core';

const model = transformers.textToSpeech('onnx-community/Kokoro-82M-v1.0-ONNX');

async function synthesizeChapter(text: string) {
  // Kokoro works best with shorter segments
  const maxLength = 5000;
  const segments = splitText(text, maxLength);
  const audioBlobs: Blob[] = [];

  for (const segment of segments) {
    const { audio } = await synthesizeSpeech({
      model,
      text: segment,
      abortSignal: controller.signal,
    });
    audioBlobs.push(audio);
  }

  // Combine audio blobs
  return new Blob(audioBlobs, { type: 'audio/wav' });
}

Best Practices

TTS Tips

  1. Keep text segments short — Kokoro works best with text under 5000 characters
  2. Use WAV output — The audio is returned as a WAV blob, ready for playback
  3. Split long text — For books or articles, split into paragraphs and synthesize each
  4. Support cancellation — TTS can take several seconds per segment

Limitations

Kokoro currently supports English only. Multiple voice styles are available via the model's built-in speaker options.

Showcase Apps

AppDescriptionLinks
Audiobook CreatorGenerate natural speech from text contentDemo · Source

Next Steps

On this page