Transformers
Text-to-Speech
Generate speech audio from text with Kokoro TTS.
Generate natural-sounding speech audio from text using Kokoro TTS models running locally in the browser.
For full API reference (synthesizeSpeech(), options, result types, and custom providers), see the Core Audio guide.
See it in action
Try Audiobook Creator for a working demo.
Recommended Models
| Model | Size | Quality | Use Case |
|---|---|---|---|
onnx-community/Kokoro-82M-v1.0-ONNX | ~82MB | High | General-purpose English TTS, natural prosody |
Audiobook Creator Example
Based on the Audiobook Creator showcase app:
import { transformers } from '@localmode/transformers';
import { synthesizeSpeech } from '@localmode/core';
const model = transformers.textToSpeech('onnx-community/Kokoro-82M-v1.0-ONNX');
async function synthesizeChapter(text: string) {
// Kokoro works best with shorter segments
const maxLength = 5000;
const segments = splitText(text, maxLength);
const audioBlobs: Blob[] = [];
for (const segment of segments) {
const { audio } = await synthesizeSpeech({
model,
text: segment,
abortSignal: controller.signal,
});
audioBlobs.push(audio);
}
// Combine audio blobs
return new Blob(audioBlobs, { type: 'audio/wav' });
}Best Practices
TTS Tips
- Keep text segments short — Kokoro works best with text under 5000 characters
- Use WAV output — The audio is returned as a WAV blob, ready for playback
- Split long text — For books or articles, split into paragraphs and synthesize each
- Support cancellation — TTS can take several seconds per segment
Limitations
Kokoro currently supports English only. Multiple voice styles are available via the model's built-in speaker options.