Audio
Speech-to-text transcription and text-to-speech synthesis.
Transcribe speech to text and synthesize speech from text, entirely in the browser. No servers, no API keys — audio never leaves the device.
See it in action
Try Voice Notes and Meeting Assistant for working demos of these APIs.
transcribe()
Transcribe audio to text using a speech-to-text model:
import { transcribe } from '@localmode/core';
import { transformers } from '@localmode/transformers';
const model = transformers.speechToText('onnx-community/moonshine-tiny-ONNX');
const { text, usage, response } = await transcribe({
model,
audio: audioBlob,
});
console.log(text); // "Hello, world!"
console.log(usage.audioDurationSec); // 3.5
console.log(response.modelId); // 'onnx-community/moonshine-tiny-ONNX'import { transcribe } from '@localmode/core';
import { transformers } from '@localmode/transformers';
const model = transformers.speechToText('onnx-community/moonshine-tiny-ONNX');
const { text, segments } = await transcribe({
model,
audio: audioBlob,
returnTimestamps: true,
});
segments?.forEach((seg) => {
console.log(`[${seg.start}s - ${seg.end}s] ${seg.text}`);
});const controller = new AbortController();
setTimeout(() => controller.abort(), 30000); // Cancel after 30s
try {
const { text } = await transcribe({
model,
audio: longAudioBlob,
abortSignal: controller.signal,
});
} catch (error) {
if (error.name === 'AbortError') {
console.log('Transcription was cancelled');
}
}TranscribeOptions
Prop
Type
TranscribeResult
Prop
Type
TranscriptionSegment
Prop
Type
Language & Translation
Specify a language hint or translate non-English audio to English:
// Transcribe German audio with language hint
const { text, language } = await transcribe({
model,
audio: germanAudioBlob,
language: 'de',
});
// Translate French audio to English
const { text: englishText } = await transcribe({
model,
audio: frenchAudioBlob,
task: 'translate',
});synthesizeSpeech()
Synthesize speech from text using a text-to-speech model:
import { synthesizeSpeech } from '@localmode/core';
import { transformers } from '@localmode/transformers';
const model = transformers.textToSpeech('onnx-community/Kokoro-82M-v1.0-ONNX');
const { audio, sampleRate, usage } = await synthesizeSpeech({
model,
text: 'Hello, how are you today?',
});
// Play the audio
const audioUrl = URL.createObjectURL(audio);
const audioElement = new Audio(audioUrl);
audioElement.play();
console.log(`Sample rate: ${sampleRate}Hz`);
console.log(`Generated ${usage.characterCount} characters in ${usage.durationMs}ms`);import { synthesizeSpeech } from '@localmode/core';
import { transformers } from '@localmode/transformers';
const model = transformers.textToSpeech('onnx-community/Kokoro-82M-v1.0-ONNX');
const { audio } = await synthesizeSpeech({
model,
text: 'Welcome to the application.',
voice: 'af_heart',
speed: 1.2,
pitch: 1.0,
});
const audioUrl = URL.createObjectURL(audio);
const audioElement = new Audio(audioUrl);
audioElement.play();const controller = new AbortController();
setTimeout(() => controller.abort(), 10000); // Cancel after 10s
try {
const { audio } = await synthesizeSpeech({
model,
text: longArticleText,
abortSignal: controller.signal,
});
} catch (error) {
if (error.name === 'AbortError') {
console.log('Speech synthesis was cancelled');
}
}SynthesizeSpeechOptions
Prop
Type
SynthesizeSpeechResult
Prop
Type
Audio Input Types
The AudioInput type accepts three formats: Blob | ArrayBuffer | Float32Array. Use whichever is most convenient for your source — file inputs produce Blob, fetch responses give ArrayBuffer, and the Web Audio API provides Float32Array.
// From a file input
const blob = fileInput.files[0]; // Blob
await transcribe({ model, audio: blob });
// From a fetch response
const buffer = await fetch('/audio.wav').then((r) => r.arrayBuffer()); // ArrayBuffer
await transcribe({ model, audio: buffer });
// From Web Audio API
const audioContext = new AudioContext();
const samples = audioBuffer.getChannelData(0); // Float32Array
await transcribe({ model, audio: samples });Custom Provider
Implement the SpeechToTextModel and TextToSpeechModel interfaces to create custom audio providers:
SpeechToTextModel
import type { SpeechToTextModel, DoTranscribeOptions, DoTranscribeResult } from '@localmode/core';
class MySpeechToText implements SpeechToTextModel {
readonly modelId = 'custom:my-stt';
readonly provider = 'custom';
readonly languages = ['en', 'de', 'fr'];
async doTranscribe(options: DoTranscribeOptions): Promise<DoTranscribeResult> {
const { audio, language, returnTimestamps, abortSignal } = options;
// Your transcription logic here
const startTime = performance.now();
return {
text: 'Transcribed text...',
segments: returnTimestamps
? [{ text: 'Transcribed text...', start: 0, end: 2.5, confidence: 0.95 }]
: undefined,
language: language ?? 'en',
usage: {
audioDurationSec: 2.5,
durationMs: performance.now() - startTime,
},
};
}
}TextToSpeechModel
import type { TextToSpeechModel, DoSynthesizeOptions, DoSynthesizeResult } from '@localmode/core';
class MyTextToSpeech implements TextToSpeechModel {
readonly modelId = 'custom:my-tts';
readonly provider = 'custom';
readonly voices = ['default', 'narrator'];
async doSynthesize(options: DoSynthesizeOptions): Promise<DoSynthesizeResult> {
const { text, voice, speed, abortSignal } = options;
// Your synthesis logic here
const startTime = performance.now();
const audioBlob = new Blob([], { type: 'audio/wav' });
return {
audio: audioBlob,
sampleRate: 24000,
usage: {
characterCount: text.length,
durationMs: performance.now() - startTime,
},
};
}
}For recommended models, provider-specific options, and practical recipes, see the Speech-to-Text and Text-to-Speech Transformers provider guides.
Next Steps
Speech-to-Text (Transformers)
Transcribe audio with Whisper and Moonshine models.
Text-to-Speech (Transformers)
Synthesize speech with Kokoro and other TTS models.
Text Generation
Generate and stream text with language models.
Middleware
Add retry, caching, and logging to any function.