Does Chrome on Android support WebGPU for AI inference?

WebGPU is enabled by default in Chrome 121+ on Android 12+ devices with Qualcomm or ARM GPUs that support Vulkan 1.1+. On devices without WebGPU, LocalMode automatically falls back to WASM-based inference via wllama or Transformers.js.

What happens if a model is too large for an Android phone?

High-end phones with 8GB+ RAM can run models up to about 1.5GB. Mid-range phones (4-6GB) should use models under 500MB, and low-end phones (2-3GB) are limited to tiny models like SmolLM2-135M at 70MB. Use detectCapabilities() and recommendModels() to select appropriate models automatically.

Can background tab killing interrupt model downloads on Android?

Yes. Android's OOM killer aggressively terminates background tabs, which can interrupt model downloads. LocalMode's createModelLoader uses chunked downloads with automatic resume to handle this, so interrupted downloads can continue where they left off.

How much storage does model caching require on Chrome Android?

Storage depends on the models you download. Chrome grants up to 60% of total disk size per origin for IndexedDB. Recommended Android models range from about 33MB (BGE-small embeddings) to 360MB (SmolLM2-360M). Use getStorageQuota() to check available space before downloading.

What is the minimum Chrome version needed for LocalMode on Android?

Basic WASM-based inference works on Chrome 57+. For WebGPU-accelerated LLM inference, Chrome 121+ on Android 12+ with a compatible GPU is required. Module workers require Chrome 80+.

Chrome on Android

Mobile browser AI with Chrome Android - WebGPU on some devices, WASM everywhere, with memory constraints.

Category: Browser Compatibility

Feature Support Matrix

The following table summarizes which web platform features are available on Chrome on Android and how they affect LocalMode's capabilities. Features marked as supported enable full functionality; partial or unsupported features trigger automatic fallbacks.

Feature	Supported	Notes
WebGPU	Chrome 121+ (device-dependent)	WebGPU enabled by default on Android 12+ with Qualcomm or ARM GPUs. Requires Vulkan 1.1+. Support expanded to additional devices in subsequent releases.
WebAssembly	Yes	Full WASM + SIMD support. Primary inference path for mobile.
IndexedDB	Yes	Full support. Chrome grants up to 60% of total disk size per origin (based on total capacity, not free space).
Memory Limit	Constrained	Android Chrome tab memory varies: 1-3GB depending on device RAM (4-12GB total).
Web Workers	Yes	Full support including module workers on Chrome 80+.
Background Execution	Limited	Android aggressively kills background tabs. Model loading may fail if user switches apps.

Understanding the Impact

Each feature in the matrix above maps to specific LocalMode capabilities:

WebGPU - Required for @localmode/webllm (GPU-accelerated LLM inference at 30-90 tokens/second). When unavailable, use @localmode/wllama (WASM, 5-20 tokens/second) as a fallback. Non-LLM tasks (embeddings, classification, vision, audio) do not require WebGPU.
WebAssembly - The universal inference backend. Required for @localmode/transformers and @localmode/wllama. WASM is supported in 97%+ of web traffic. SIMD support (for optimized vector operations) requires newer browser versions.
IndexedDB - Used for persistent vector storage (VectorDB) and model caching (createModelLoader). When blocked (Safari Private Browsing), LocalMode falls back to MemoryStorage (data lost on tab close).
Web Workers - Enable background model loading and inference without blocking the main UI thread. Module workers (for ES module imports in workers) require newer browser versions.
SharedArrayBuffer - Enables multi-threaded WASM inference for improved performance. Requires Cross-Origin Isolation headers (COOP/COEP). Not required for basic functionality.
Web Locks - Used for cross-tab model loading coordination (prevents multiple tabs from downloading the same model simultaneously). Falls back to InMemoryLockManager when unavailable.
BroadcastChannel - Used for cross-tab VectorDB synchronization. Falls back to LocalStorageBroadcaster when unavailable.

Fallback Strategies

Mobile Android is highly variable. High-end phones (8GB+ RAM, recent Snapdragon/MediaTek) can run models up to ~1.5GB. Mid-range phones (4-6GB RAM) should stick to models under 500MB. Low-end phones (2-3GB RAM) are limited to tiny models (SmolLM2-135M, D-FINE at 5MB). Use detectCapabilities() to detect hardware and recommendModels() to select appropriate models. Always have a fallback to the smallest model.

LocalMode is designed with progressive enhancement in mind. The core principle: detect capabilities at runtime and use the best available path. The @localmode/core package exports detection utilities for this purpose:

import {
  isWebGPUSupported,
  isIndexedDBSupported,
  isCrossOriginIsolated,
  detectCapabilities,
  recommendModels,
} from '@localmode/core';

async function detectAndConfigure() {
  const caps = await detectCapabilities();
  console.log(caps);
  // caps.features.webgpu, caps.hardware.memory (GB), caps.storage.availableBytes

  // isWebGPUSupported() is async - it must be awaited
  if (await isWebGPUSupported()) {
    // Use @localmode/webllm for GPU-accelerated inference
  }

  // recommendModels() is synchronous: capabilities first, options second
  const recommendations = recommendModels(caps, {
    task: 'generation',
    maxSizeMB: 1500,
  });
}

Recommended Providers

For Chrome on Android, the recommended LocalMode providers are:

wllama (WASM) - Universal LLM inference via WASM. Works without WebGPU. The safe choice for broad compatibility.
Transformers.js - Broadest model catalog for non-LLM tasks (embeddings, classification, vision, audio). WASM-based, works everywhere.

Recommended Models

The following models are tested and recommended for Chrome on Android:

Model	Provider
SmolLM2-360M-Instruct-Q4_K_M	wllama (WASM)
Xenova/bge-small-en-v1.5	Transformers.js
onnx-community/dfine_n_coco-ONNX	Transformers.js

These models are chosen for their compatibility with Chrome on Android's capabilities and constraints. They represent the best balance of quality, size, and performance for this platform.

Known Issues

Background tab killing by Android OOM killer can interrupt model downloads. Use chunked downloads with resume (built into createModelLoader). Battery impact: sustained inference will drain battery. Models above 2GB may cause Chrome to crash on 4GB RAM devices.

Mitigation Strategies

When building applications that target Chrome on Android, follow these practices:

Always detect before loading - Use await isWebGPUSupported(), isIndexedDBSupported(), and await detectCapabilities() before attempting to load models or create storage. Never assume a feature is available.
Wrap model loading in try/catch - Even when detection succeeds, model loading can fail due to memory pressure, network issues, or browser bugs. Always have a fallback path that attempts a smaller model.
Pick models with recommendModels() - Pass the detected capabilities to recommendModels(caps, { task }) to select a model appropriate for the current device. It is the recommended pattern for production deployments.
Test on real hardware - Browser DevTools device emulation does not accurately simulate memory limits, GPU capabilities, or storage quotas. Test on actual Android devices.
Monitor storage quota - Use getStorageQuota() to check available space before downloading large models. Inform users if storage is insufficient rather than failing silently.

Web Standards References

Chrome Android Feature Support

Chrome Desktop - compatibility guide
Low Memory Devices - compatibility guide
Smollm2 - model guide

Methodology

Browser feature support data on this page is sourced from MDN Web Docs, caniuse.com, and official Chrome developer documentation (developer.chrome.com), cross-referenced with LocalMode's runtime feature detection (packages/core/src/capabilities/features.ts). Browser version numbers reflect the point at which each feature shipped enabled by default, verified against Chrome Platform Status entries and Chrome release blog posts. Storage quota figures are taken directly from MDN's Storage API documentation and the web.dev storage guide. Browser support evolves - verify current support with the linked references for production decisions. Data current as of May 2026.

Sources

What's New in WebGPU (Chrome 121) - WebGPU enabled on Android - confirms Chrome 121, Android 12+, Qualcomm/ARM GPUs
Chrome ships WebGPU (Chrome 113, desktop) - desktop WebGPU launch reference
WebGPU on Android - Chrome Platform Status
Storage quotas and eviction criteria - MDN - Chrome: 60% of total disk size
Storage for the web - web.dev - confirms 60% of total disk, not free space
Worker() constructor: ECMAScript modules support - caniuse - module workers Chrome 80+
ES Modules for dedicated workers - Chrome Platform Status
New in Chrome 80 - module workers
LocalMode capability detection source (packages/core/src/capabilities/features.ts)

Frequently Asked Questions