Vision

LocalMode provides twelve vision functions that run entirely in the browser — no server required. All accept an ImageInput (Blob, ImageData, string URL, or ArrayBuffer) and return structured results with usage metrics.

See it in action

Try Object Detector and Background Remover for working demos of these APIs.

Function	Purpose
`captionImage()`	Generate natural language descriptions
`classifyImage()`	Classify into pre-trained categories
`classifyImageZeroShot()`	Classify into arbitrary labels (CLIP/SigLIP)
`detectObjects()`	Locate and label objects with bounding boxes
`segmentImage()`	Produce pixel-level masks per region
`extractImageFeatures()`	Extract feature vectors for similarity search
`imageToImage()`	Super-resolution / image transformation
`detectHands()`	Detect 21-point hand landmarks
`detectPose()`	Detect 33-point body pose landmarks
`detectFace()`	Detect faces with bounding boxes and keypoints
`detectFaceLandmarks()`	Detect the 478-point face mesh
`recognizeGesture()`	Recognize hand gestures

Landmark & gesture detection

detectHands(), detectPose(), detectFace(), detectFaceLandmarks(), and recognizeGesture() are implemented by the @localmode/mediapipe provider. For real-time 30-60fps video tracking, model recommendations, and practical recipes, see the MediaPipe provider guide.

captionImage()

Generate a natural language caption for an image:

import { captionImage } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.captioner('onnx-community/Florence-2-base-ft');

const { caption, usage, response } = await captionImage({
  model,
  image: imageBlob,
});

console.log(caption); // "a golden retriever playing with a ball in a park"
console.log(`Processed in ${usage.durationMs}ms`);

const controller = new AbortController();

setTimeout(() => controller.abort(), 10000); // Cancel after 10s

const { caption } = await captionImage({
  model,
  image: imageBlob,
  abortSignal: controller.signal,
});

CaptionImageOptions

Prop

Type

CaptionImageResult

Prop

Type

classifyImage()

Classify an image into pre-trained categories:

import { classifyImage } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.imageClassifier('Xenova/vit-base-patch16-224');

const { predictions, usage } = await classifyImage({
  model,
  image: imageBlob,
  topK: 5,
});

predictions.forEach((p) => {
  console.log(`${p.label}: ${(p.score * 100).toFixed(1)}%`);
});
// golden retriever: 92.3%
// Labrador retriever: 4.1%
// ...

ClassifyImageOptions

Prop

Type

ClassifyImageResult

Prop

Type

ImageClassificationResultItem

Prop

Type

classifyImageZeroShot()

Classify an image into arbitrary labels without fine-tuning, using models like CLIP or SigLIP:

import { classifyImageZeroShot } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.zeroShotImageClassifier('Xenova/siglip-base-patch16-224');

const { labels, scores } = await classifyImageZeroShot({
  model,
  image: imageBlob,
  candidateLabels: ['cat', 'dog', 'bird', 'car', 'tree'],
});

console.log(`Top prediction: ${labels[0]} (${(scores[0] * 100).toFixed(1)}%)`);
// Top prediction: dog (87.2%)

ClassifyImageZeroShotOptions

Prop

Type

ClassifyImageZeroShotResult

Prop

Type

detectObjects()

Detect and locate objects in an image with bounding boxes:

import { detectObjects } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.objectDetector('onnx-community/dfine_n_coco-ONNX');

const { objects, usage } = await detectObjects({
  model,
  image: imageBlob,
  threshold: 0.7,
});

for (const obj of objects) {
  console.log(`${obj.label} (${(obj.score * 100).toFixed(1)}%)`);
  console.log(`  Box: x=${obj.box.x}, y=${obj.box.y}, ${obj.box.width}x${obj.box.height}`);
}
// person (95.2%)
//   Box: x=120, y=45, 200x380
// dog (88.7%)
//   Box: x=350, y=210, 150x170

DetectObjectsOptions

Prop

Type

DetectObjectsResult

Prop

Type

DetectedObject

Prop

Type

segmentImage()

Segment an image into pixel-level regions:

import { segmentImage } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.segmenter('briaai/RMBG-1.4');

const { masks, usage } = await segmentImage({
  model,
  image: imageBlob,
});

for (const mask of masks) {
  console.log(`${mask.label}: ${(mask.score * 100).toFixed(1)}%`);
}
// foreground: 97.8%
// background: 96.2%

SegmentImageOptions

Prop

Type

SegmentImageResult

Prop

Type

SegmentMask

Prop

Type

extractImageFeatures()

Extract a feature vector from an image for similarity search, clustering, or reverse image lookup:

import { extractImageFeatures, cosineSimilarity } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.imageFeatures('Xenova/siglip-base-patch16-224');

const { features: features1 } = await extractImageFeatures({
  model,
  image: image1,
});

const { features: features2 } = await extractImageFeatures({
  model,
  image: image2,
});

const similarity = cosineSimilarity(features1, features2);
console.log(`Image similarity: ${(similarity * 100).toFixed(1)}%`);

ExtractImageFeaturesOptions

Prop

Type

ExtractImageFeaturesResult

Prop

Type

imageToImage()

Transform an image using super-resolution or other image-to-image models. This is an alias for upscaleImage().

import { imageToImage } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.imageToImage('Xenova/swin2SR-lightweight-x2-64');

const { image, usage } = await imageToImage({
  model,
  image: lowResImage,
  scale: 2,
});

console.log(`Upscaled in ${usage.durationMs}ms`);

UpscaleImageOptions

Prop

Type

UpscaleImageResult

Prop

Type

Image Input Types

Supported image formats

All vision functions accept ImageInput, which is a union of four types:

Blob — File uploads, fetch() responses, canvas exports
ImageData — Raw pixel data from <canvas> via getImageData()
string — A URL (data URI, object URL, or remote URL)
ArrayBuffer — Raw binary image data

// From a file input
const blob: Blob = fileInput.files[0];

// From a canvas
const imageData: ImageData = ctx.getImageData(0, 0, width, height);

// From a URL
const url: string = 'https://example.com/photo.jpg';

// From fetch
const buffer: ArrayBuffer = await fetch(url).then((r) => r.arrayBuffer());

Custom Provider

Implement the ImageCaptionModel interface to create a custom captioning provider. The other six vision interfaces (ImageClassificationModel, ZeroShotImageClassificationModel, ObjectDetectionModel, SegmentationModel, ImageFeatureModel, ImageToImageModel) follow the same pattern.

import type { ImageCaptionModel, DoCaptionImageOptions, DoCaptionImageResult } from '@localmode/core';

class MyCustomCaptioner implements ImageCaptionModel {
  readonly modelId = 'custom:my-captioner';
  readonly provider = 'custom';

  async doCaption(options: DoCaptionImageOptions): Promise<DoCaptionImageResult> {
    const { images, maxLength, abortSignal } = options;

    // Your captioning logic here
    const captions = images.map(() => 'A description of the image');

    return {
      captions,
      usage: { durationMs: 0 },
    };
  }
}

// Use with core functions
const model = new MyCustomCaptioner();
const { caption } = await captionImage({ model, image: imageBlob });

For recommended models, provider-specific options, and practical recipes, see the Transformers.js provider pages: Image Captioning, Image Classification, Zero-Shot Image, Object Detection, Image Segmentation, Image Features, and Image-to-Image.

App	Description	Links
Object Detector	Detect and label objects in images	Demo · Source
Image Captioner	Generate natural language image descriptions	Demo · Source
Background Remover	Segment and remove image backgrounds	Demo · Source
Photo Enhancer	Upscale and enhance photos with image-to-image models	Demo · Source
Duplicate Finder	Extract image features for duplicate detection	Demo · Source
Smart Gallery	Classify and organize photos by content	Demo · Source
Product Search	Visual product classification and search	Demo · Source

Vision

captionImage()

CaptionImageOptions

CaptionImageResult

classifyImage()

ClassifyImageOptions

ClassifyImageResult

ImageClassificationResultItem

classifyImageZeroShot()

ClassifyImageZeroShotOptions

ClassifyImageZeroShotResult

detectObjects()

DetectObjectsOptions

DetectObjectsResult

DetectedObject

segmentImage()

SegmentImageOptions

SegmentImageResult

SegmentMask

extractImageFeatures()

ExtractImageFeaturesOptions

ExtractImageFeaturesResult

imageToImage()

UpscaleImageOptions

UpscaleImageResult

Image Input Types

Custom Provider

Next Steps

Embeddings

Vector Database

Transformers Provider

Showcase Apps

On this page