Image Captioning

Generate natural language descriptions of images using vision-language models like Florence-2.

For full API reference (captionImage(), options, result types, and custom providers), see the Core Vision guide.

See it in action

Try Image Captioner for a working demo.

Recommended Models

Model	Size	Speed	Use Case
`onnx-community/Florence-2-base-ft`	~460MB	⚡⚡	Captioning, OCR, detection, and document QA

File Upload Example

Based on the Image Captioner showcase app:

import { transformers } from '@localmode/transformers';
import { captionImage } from '@localmode/core';

const model = transformers.captioner('onnx-community/Florence-2-base-ft');

async function handleImageUpload(file: File) {
  const dataUrl = await new Promise<string>((resolve) => {
    const reader = new FileReader();
    reader.onload = () => resolve(reader.result as string);
    reader.readAsDataURL(file);
  });

  const { caption } = await captionImage({
    model,
    image: dataUrl,
    abortSignal: controller.signal,
  });

  return caption;
}