Transformers
Image Captioning
Generate text descriptions of images with Florence-2.
Generate natural language descriptions of images using vision-language models like Florence-2.
For full API reference (captionImage(), options, result types, and custom providers), see the Core Vision guide.
See it in action
Try Image Captioner for a working demo.
Recommended Models
| Model | Size | Speed | Use Case |
|---|---|---|---|
onnx-community/Florence-2-base-ft | ~460MB | ⚡⚡ | Captioning, OCR, detection, and document QA |
File Upload Example
Based on the Image Captioner showcase app:
import { transformers } from '@localmode/transformers';
import { captionImage } from '@localmode/core';
const model = transformers.captioner('onnx-community/Florence-2-base-ft');
async function handleImageUpload(file: File) {
const dataUrl = await new Promise<string>((resolve) => {
const reader = new FileReader();
reader.onload = () => resolve(reader.result as string);
reader.readAsDataURL(file);
});
const { caption } = await captionImage({
model,
image: dataUrl,
abortSignal: controller.signal,
});
return caption;
}Image Input Formats
The image parameter accepts:
string— Data URL (data:image/jpeg;base64,...) or regular URLBlob— Image blob from file input or fetch
Best Practices
Captioning Tips
- Use JPEG/PNG/WebP — These formats are well-supported
- Resize large images — Smaller images process faster with similar quality
- Cache the model — Load once, caption many images
- Handle errors — Invalid or corrupted images will throw