LocalMode
React

Vision

Hooks for image captioning, object detection, classification, and segmentation.

Vision Hooks

See it in action

Try Object Detector and Background Remover for working demos of these hooks.

useCaptionImage

Generate a text caption for an image.

import { useCaptionImage } from '@localmode/react';
import { transformers } from '@localmode/transformers';

const model = transformers.imageCaptioner('Xenova/vit-gpt2-image-captioning');

function Demo() {
  const { data, isLoading, execute } = useCaptionImage({ model });
  // execute(imageDataUrl) => data.caption = "A cat sitting on a couch"
}

useDetectObjects

Detect objects with bounding boxes.

import { useDetectObjects } from '@localmode/react';

const { data, execute } = useDetectObjects({ model });
await execute(imageDataUrl);
// data.objects = [{ label: 'person', score: 0.95, box: { x, y, width, height } }]

useClassifyImage

Classify an image into categories.

import { useClassifyImage } from '@localmode/react';

const { data, execute } = useClassifyImage({ model });
await execute(imageDataUrl);
// data.label = 'cat', data.score = 0.97

useSegmentImage

Segment an image into regions with masks.

import { useSegmentImage } from '@localmode/react';

const { data, execute } = useSegmentImage({ model });
await execute(imageDataUrl);
// data.masks = [{ label: 'background', mask: Uint8Array, score: 0.98 }]

useClassifyImageZeroShot

Zero-shot image classification with custom labels (no fine-tuning needed).

import { useClassifyImageZeroShot } from '@localmode/react';

const { data, execute } = useClassifyImageZeroShot({ model });
await execute({ image: imageDataUrl, labels: ['cat', 'dog', 'bird'] });
// data.label = 'cat', data.score = 0.92

useExtractImageFeatures

Extract feature vectors from images for similarity comparison.

import { useExtractImageFeatures } from '@localmode/react';

const { data, execute } = useExtractImageFeatures({ model });
await execute(imageDataUrl);
// data.features = Float32Array(768)

useImageToImage

Image super-resolution or style transfer.

import { useImageToImage } from '@localmode/react';

const { data, execute } = useImageToImage({ model });
await execute(imageDataUrl);
// data.image = 'data:image/png;base64,...' (upscaled/transformed image)

All vision hooks accept image data URLs (from FileReader.readAsDataURL). For model recommendations, see the Transformers guide.

Landmark & Gesture Hooks

@localmode/react provides hooks for MediaPipe landmark and gesture detection. Each takes { model } (from @localmode/mediapipe) and returns the standard { data, error, isLoading, execute, cancel, reset } shape.

import { useDetectHands } from '@localmode/react';
import { mediapipe } from '@localmode/mediapipe';

const { data, execute } = useDetectHands({ model: mediapipe.handLandmarker() });
await execute(imageBlob);
// data.hands = [{ landmarks, worldLandmarks, handedness, score }, ...]
HookDetects
useDetectHands21-point hand landmarks
useDetectPose33-point body pose landmarks
useDetectFaceFace bounding boxes and keypoints
useDetectFaceLandmarks478-point face mesh (+ optional blendshapes)
useRecognizeGestureHand gestures (8 categories)

For real-time 30-60fps video tracking, use the streaming tracker API from @localmode/mediapipe instead of these single-frame hooks.

Showcase Apps

AppDescriptionLinks
Object DetectorDetect objects with useDetectObjectsDemo · Source
Background RemoverSegment images with useSegmentImageDemo · Source
Photo EnhancerEnhance images with useImageToImageDemo · Source
Image CaptionerCaption images with useOperationListDemo · Source
MediaPipe StudioReal-time hand/pose/face/gesture trackingDemo · Source
Duplicate FinderCompare image features with useSequentialBatchDemo · Source

On this page