Vision
Hooks for image captioning, object detection, classification, and segmentation.
Vision Hooks
See it in action
Try Object Detector and Background Remover for working demos of these hooks.
useCaptionImage
Generate a text caption for an image.
import { useCaptionImage } from '@localmode/react';
import { transformers } from '@localmode/transformers';
const model = transformers.imageCaptioner('Xenova/vit-gpt2-image-captioning');
function Demo() {
const { data, isLoading, execute } = useCaptionImage({ model });
// execute(imageDataUrl) => data.caption = "A cat sitting on a couch"
}useDetectObjects
Detect objects with bounding boxes.
import { useDetectObjects } from '@localmode/react';
const { data, execute } = useDetectObjects({ model });
await execute(imageDataUrl);
// data.objects = [{ label: 'person', score: 0.95, box: { x, y, width, height } }]useClassifyImage
Classify an image into categories.
import { useClassifyImage } from '@localmode/react';
const { data, execute } = useClassifyImage({ model });
await execute(imageDataUrl);
// data.label = 'cat', data.score = 0.97useSegmentImage
Segment an image into regions with masks.
import { useSegmentImage } from '@localmode/react';
const { data, execute } = useSegmentImage({ model });
await execute(imageDataUrl);
// data.masks = [{ label: 'background', mask: Uint8Array, score: 0.98 }]useClassifyImageZeroShot
Zero-shot image classification with custom labels (no fine-tuning needed).
import { useClassifyImageZeroShot } from '@localmode/react';
const { data, execute } = useClassifyImageZeroShot({ model });
await execute({ image: imageDataUrl, labels: ['cat', 'dog', 'bird'] });
// data.label = 'cat', data.score = 0.92useExtractImageFeatures
Extract feature vectors from images for similarity comparison.
import { useExtractImageFeatures } from '@localmode/react';
const { data, execute } = useExtractImageFeatures({ model });
await execute(imageDataUrl);
// data.features = Float32Array(768)useImageToImage
Image super-resolution or style transfer.
import { useImageToImage } from '@localmode/react';
const { data, execute } = useImageToImage({ model });
await execute(imageDataUrl);
// data.image = 'data:image/png;base64,...' (upscaled/transformed image)All vision hooks accept image data URLs (from FileReader.readAsDataURL). For model recommendations, see the Transformers guide.
Landmark & Gesture Hooks
@localmode/react provides hooks for MediaPipe landmark and gesture detection.
Each takes { model } (from @localmode/mediapipe) and returns the standard
{ data, error, isLoading, execute, cancel, reset } shape.
import { useDetectHands } from '@localmode/react';
import { mediapipe } from '@localmode/mediapipe';
const { data, execute } = useDetectHands({ model: mediapipe.handLandmarker() });
await execute(imageBlob);
// data.hands = [{ landmarks, worldLandmarks, handedness, score }, ...]| Hook | Detects |
|---|---|
useDetectHands | 21-point hand landmarks |
useDetectPose | 33-point body pose landmarks |
useDetectFace | Face bounding boxes and keypoints |
useDetectFaceLandmarks | 478-point face mesh (+ optional blendshapes) |
useRecognizeGesture | Hand gestures (8 categories) |
For real-time 30-60fps video tracking, use the streaming tracker API from
@localmode/mediapipe instead of these single-frame hooks.
Showcase Apps
| App | Description | Links |
|---|---|---|
| Object Detector | Detect objects with useDetectObjects | Demo · Source |
| Background Remover | Segment images with useSegmentImage | Demo · Source |
| Photo Enhancer | Enhance images with useImageToImage | Demo · Source |
| Image Captioner | Caption images with useOperationList | Demo · Source |
| MediaPipe Studio | Real-time hand/pose/face/gesture tracking | Demo · Source |
| Duplicate Finder | Compare image features with useSequentialBatch | Demo · Source |