Speech-to-Text
Transcribe audio to text with Moonshine models.
Transcribe audio to text using Moonshine models running locally in the browser. Optimized for fast, efficient speech recognition with optional timestamps.
For full API reference (transcribe(), options, result types, and custom providers), see the Core Audio guide.
See it in action
Try Voice Notes and Meeting Assistant for working demos.
Recommended Models
| Model | Size | Speed | Quality | Use Case |
|---|---|---|---|---|
onnx-community/moonshine-tiny-ONNX | ~50MB | ⚡⚡⚡ | Good | Quick transcription, voice notes |
onnx-community/moonshine-base-ONNX | ~237MB | ⚡⚡ | Better | Higher accuracy |
Voice Notes Example
Based on the Voice Notes showcase app:
import { transformers } from '@localmode/transformers';
import { transcribe } from '@localmode/core';
const model = transformers.speechToText('onnx-community/moonshine-tiny-ONNX');
// Record audio from microphone
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const recorder = new MediaRecorder(stream, { mimeType: 'audio/webm' });
const chunks: Blob[] = [];
recorder.ondataavailable = (e) => chunks.push(e.data);
recorder.start();
// Stop recording after some time...
recorder.stop();
recorder.onstop = async () => {
const audioBlob = new Blob(chunks, { type: 'audio/webm' });
const { text } = await transcribe({
model,
audio: audioBlob,
abortSignal: controller.signal,
});
console.log('Transcription:', text);
};With Timestamps
const { text, segments } = await transcribe({
model,
audio: audioBlob,
returnTimestamps: true,
});
segments?.forEach((s) => {
console.log(`[${s.start.toFixed(1)}s - ${s.end.toFixed(1)}s] ${s.text}`);
});Audio Input Formats
The audio parameter accepts:
Blob— FromMediaRecorder, file input, or fetchArrayBuffer— Raw audio dataFloat32Array— PCM audio samples
Best Practices
STT Tips
- Start with moonshine-tiny — Fast to download (~50MB) and good enough for most use cases
- Use WebM or OGG — These formats work well with MediaRecorder
- Support cancellation — Transcription can take time; always pass
abortSignal - Handle permissions — Request microphone access gracefully with fallback UI
Showcase Apps
| App | Description | Links |
|---|---|---|
| Voice Notes | Record and transcribe audio with Moonshine models | Demo · Source |
| Meeting Assistant | Transcribe meeting recordings for summarization | Demo · Source |
Live streaming
For microphone-driven streaming transcription with Voice Activity Detection (push-to-talk and open-mic modes), see /docs/core/live-transcribe. It builds on top of any SpeechToTextModel returned by transformers.speechToText() — no model changes required.
Recommended models for live mode
| Model | Size | Latency | Notes |
|---|---|---|---|
onnx-community/moonshine-tiny-ONNX | ~50MB | Lowest | Best for live push-to-talk; 30s window |
onnx-community/moonshine-base-ONNX | ~237MB | Low | Higher quality, still fast on WebGPU |
Xenova/whisper-tiny | ~70MB | Medium | Multilingual; chunk via maxUtteranceSec |
For VAD, use transformers.vad('onnx-community/silero-vad') (~1.8MB) for production-quality open-mic. The built-in energy VAD covers push-to-talk with no extra downloads.
import { transformers } from '@localmode/transformers';
import { createLiveTranscriber } from '@localmode/core';
const transcriber = await createLiveTranscriber({
model: transformers.speechToText('onnx-community/moonshine-tiny-ONNX'),
mode: 'open-mic',
vad: transformers.vad('onnx-community/silero-vad'),
});
transcriber.onUtteranceEnd((u) => console.log(u.text));
await transcriber.start();