What is the best model for image segmentation in the browser?

For full semantic segmentation across 150 ADE20K categories, use Xenova/segformer-b0-finetuned-ade-512-512 (15.3MB fp32). For real-time selfie and person background removal, MediaPipe's Selfie Segmenter is only 250KB and runs at 30+ fps.

How large is the model download for browser image segmentation?

SegFormer B0 is 15.3MB (fp32), with a quantized variant at ~4.4MB. MediaPipe's Selfie Segmenter is only 250KB, making it one of the smallest ML models available for any task.

Does browser-based image segmentation work offline?

Yes. After the initial model download (as small as 250KB for selfie segmentation), image segmentation works completely offline with no server or API key required.

What categories can browser semantic segmentation detect?

SegFormer B0 classifies every pixel across 150 ADE20K categories including buildings, roads, sky, vegetation, furniture, and people. MediaPipe's Selfie Segmenter focuses specifically on person and background segmentation.

Image Segmentation in the Browser

Classify every pixel in an image into semantic categories - roads, buildings, people, sky - using SegFormer.

What Is Image Segmentation?

Semantic segmentation assigns a category label to every pixel in an image, creating a detailed map of the scene. SegFormer uses a hierarchical transformer encoder with a lightweight MLP decoder to produce pixel-level predictions across 150 ADE20K categories including buildings, roads, sky, vegetation, furniture, and people.

This capability is exposed through the segmentImage() function in @localmode/core. All processing runs entirely in the browser - no server, no API key, no data leaves the device. After the initial model download, image segmentation works completely offline.

Real-World Applications

Background removal and replacement. Augmented reality scene understanding. Autonomous navigation (road detection). Medical image analysis (tissue segmentation). Photo editing (selective adjustments). Urban planning (land use analysis).

These use cases all benefit from local, on-device processing: user data stays private, there are no per-request API costs, and the application works without internet after initial setup.

Getting Started

Install the required packages:

npm install @localmode/core @localmode/transformers

Import the core function and provider:

import { segmentImage } from '@localmode/core';
import { transformers } from '@localmode/transformers';

The recommended starting model is Xenova/segformer-b0-finetuned-ade-512-512 - it provides the best balance of quality, speed, and download size for most applications.

Code Example

import { segmentImage } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.segmenter('Xenova/segformer-b0-finetuned-ade-512-512');

const { masks } = await segmentImage({
  model,
  image: scenePhoto,
});

// masks: array of SegmentMask objects, one per detected category

This example demonstrates the core workflow: create a model instance from the provider, call the segmentImage() function with your input, and receive structured results. The same pattern works identically across both available providers: Transformers.js and MediaPipe.

Available Models

The following models support image segmentation through LocalMode. Choose based on your target device, acceptable download size, and quality requirements.

Model	Provider	Size	Speed	Quality
Xenova/segformer-b0-finetuned-ade-512-512	Transformers.js	15.3MB (fp32)	Fast	Good
image_segmenter (Selfie Segmenter)	MediaPipe	250KB	Fast	Good (person only)

Choosing a model: For semantic segmentation across 150 ADE20K categories (buildings, sky, roads, etc.), use Xenova/segformer-b0-finetuned-ade-512-512 via Transformers.js. For real-time selfie/person background removal, use MediaPipe's selfie segmenter - it is extremely small (250KB) and runs at 30+ fps. Pass a quantized variant (model_quantized.onnx, ~4.4MB) when minimizing download size is critical.

Cloud vs Local: Cost and Privacy Comparison

Running image segmentation locally eliminates per-request API costs and keeps all data on-device. Here is how the economics compare:

Service	Cost / Notes
LocalMode	runs SegFormer B0 (~15MB fp32 ONNX) with $0 cost, entirely on-device

Google Cloud Vision segmentation is not available as a standalone API. Custom ML models on cloud require significant setup. LocalMode runs SegFormer B0 locally with $0 cost, entirely on-device.

The break-even point for most applications is low: if you process more than a few hundred requests per day, local inference costs less than any cloud API within the first week. For privacy-sensitive applications (medical records, legal documents, financial data), the cost comparison is secondary - the ability to process data without it ever leaving the device is the primary value.

Available Providers

Transformers.js - ONNX-optimized models via ONNX Runtime Web. Supports both WebGPU and WASM backends. Broadest model catalog for non-LLM tasks. Recommended for ADE20K semantic segmentation via SegFormer.
MediaPipe - Google's WASM + WebGL runtime (no WebGPU). Provides the image_segmenter (Selfie Segmenter, 250KB) for real-time person/background segmentation. Use mediapipe.imageSegmenter() from @localmode/mediapipe.

AbortSignal Support

All segmentImage() calls support cancellation through the standard AbortSignal API:

const controller = new AbortController();

const promise = segmentImage({
  model,
  image: imageFile,
  abortSignal: controller.signal,
});

// Cancel if needed (e.g., user navigates away)
controller.abort();

This is essential for responsive UIs - cancel in-flight operations when the user navigates away, submits a new query, or closes a dialog. The underlying model inference stops immediately, freeing memory and compute resources.

React Integration

If you are building a React application, @localmode/react provides hooks that manage loading states, error handling, and cancellation automatically:

npm install @localmode/react

import { useSegmentImage } from '@localmode/react';

The hook returns { data, error, isLoading, execute, cancel } - providing everything a UI component needs to display progress, handle errors, and offer cancellation.

Vision Models - model guide
Text Generation - task guide
Text Embeddings - task guide

Methodology

This guide was verified against LocalMode's source code (packages/core/src/vision/segment-image.ts, packages/transformers/src/implementations/segmenter.ts, packages/mediapipe/src/implementations/image-segmenter.ts, packages/mediapipe/src/models.ts, packages/react/src/hooks/use-segment-image.ts). Model sizes were verified from the Xenova HuggingFace repository file tree. Architecture and performance figures for SegFormer B0 (3.8M parameters, 37.4% mIoU on ADE20K) are sourced from the original NeurIPS 2021 paper. Quality and performance comparisons are general guidance; benchmark with your own data for production use.

Image Segmentation in the Browser

Image Segmentation in the Browser

What Is Image Segmentation?

Real-World Applications

Getting Started

Code Example

Available Models

Cloud vs Local: Cost and Privacy Comparison

Available Providers

AbortSignal Support

React Integration

Methodology

Sources

Frequently Asked Questions