LocalMode
Transformers

Zero-Shot Image Classification

Classify images into custom categories using CLIP — no fine-tuning required.

Classify images into arbitrary categories without fine-tuning using CLIP (Contrastive Language-Image Pre-Training) models. Provide any set of candidate labels and the model scores each one against the image.

For full API reference (classifyImageZeroShot(), options, result types, and custom providers), see the Core Vision guide.

See it in action

Try Smart Gallery and Product Search for working demos.

How It Works

CLIP models learn to associate images with text descriptions. Unlike traditional classifiers trained on fixed categories, CLIP can classify images into any labels you provide at inference time.

┌──────────────┐     ┌──────────────┐
│  Image       │     │  Text Labels │
│  Encoder     │     │  Encoder     │
└──────┬───────┘     └──────┬───────┘
       │                    │
       ▼                    ▼
  [image vector]      [text vectors]
       │                    │
       └────── cosine ──────┘
              similarity
               scores
ModelSizeUse Case
Xenova/clip-vit-base-patch32~150MBGeneral zero-shot classification

Practical Examples

Content Moderation

import { transformers } from '@localmode/transformers';
import { classifyImageZeroShot } from '@localmode/core';

const model = transformers.zeroShotImageClassifier('Xenova/clip-vit-base-patch32');

const { labels, scores } = await classifyImageZeroShot({
  model,
  image: uploadedImage,
  candidateLabels: ['safe content', 'inappropriate content', 'violent content'],
});

const isSafe = labels[0] === 'safe content' && scores[0] > 0.7;

Product Categorization

const { labels } = await classifyImageZeroShot({
  model,
  image: productPhoto,
  candidateLabels: ['electronics', 'clothing', 'furniture', 'food', 'toys'],
});

console.log(`Category: ${labels[0]}`);

Label Engineering

Results improve with descriptive labels. Use "a photo of a cat" instead of just "cat". The model compares image features against text features, so more descriptive labels give better signal.

Showcase Apps

AppDescriptionLinks
Smart GalleryClassify photos into custom categoriesDemo · Source
Product SearchClassify products with zero-shot image labelsDemo · Source

Next Steps

On this page