Transformers
Image Classification
Classify images into categories using Vision Transformer models.
Classify images into predefined categories using Vision Transformer (ViT) models. The model returns the top predicted labels with confidence scores.
For full API reference (classifyImage(), options, result types, and custom providers), see the Core Vision guide.
See it in action
Try Smart Gallery for a working demo.
Recommended Models
| Model | Size | Categories | Use Case |
|---|---|---|---|
Xenova/vit-base-patch16-224 | ~86MB | 1000 ImageNet classes | General image classification |
ImageNet Classes
ViT models trained on ImageNet classify into 1000 categories including animals, vehicles, food, and everyday objects. For classifying into custom categories, use Zero-Shot Image Classification with CLIP.