Does Chrome Desktop support WebGPU for AI inference?

Yes. Chrome Desktop has full WebGPU support since Chrome 113 (stable since 2023). This enables GPU-accelerated LLM inference at 30-90 tokens per second via @localmode/webllm. Chrome Desktop is the recommended development target for LocalMode.

How much storage does model caching require on Chrome Desktop?

Chrome grants up to 60% of total disk size per origin for IndexedDB storage. Model sizes vary from about 33MB for embeddings (BGE-small) to several gigabytes for larger LLMs. Use navigator.storage.persist() to prevent Chrome from evicting cached models under storage pressure.

Can I use SharedArrayBuffer on Chrome Desktop?

Yes, but it requires Cross-Origin Isolation headers (Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy). SharedArrayBuffer enables multi-threaded WASM inference for improved performance but is not required for basic LocalMode functionality.

What is the minimum Chrome Desktop version needed for LocalMode?

WebAssembly works from Chrome 57+, which covers basic inference. WebGPU requires Chrome 113+. For the full feature set including Web Locks (69+) and BroadcastChannel (54+), Chrome 113+ is recommended.

Does Chrome Desktop support Chrome Built-in AI with LocalMode?

Yes. Chrome 138+ provides stable Summarizer and Translator APIs via Gemini Nano with zero model download required. LocalMode's @localmode/chrome-ai provider wraps these APIs. Prompt API is stable for extensions since Chrome 138 and for web pages since Chrome 148.

Chrome Desktop

Full LocalMode support on Chrome Desktop - WebGPU, WASM, IndexedDB, Web Workers, and all 136 curated models.

Category: Browser Compatibility

Feature Support Matrix

The following table summarizes which web platform features are available on Chrome Desktop and how they affect LocalMode's capabilities. Features marked as supported enable full functionality; partial or unsupported features trigger automatic fallbacks.

Feature	Supported	Notes
WebGPU	Yes (Chrome 113+)	Full WebGPU support. Required for WebLLM. Enables fastest LLM inference (30-90 tok/s).
WebAssembly	Yes (Chrome 57+)	Full WASM + SIMD support. Required for wllama and Transformers.js.
IndexedDB	Yes	Full support. Used for VectorDB persistence and model caching. Quota: up to 60% of total disk size per origin (Chrome).
Web Workers	Yes	Full support including module workers. Enables background model loading and inference.
SharedArrayBuffer	Yes (with COOP/COEP headers)	Requires Cross-Origin-Opener-Policy and Cross-Origin-Embedder-Policy headers.
Web Locks	Yes (Chrome 69+)	Full support. Used for cross-tab model loading coordination.
BroadcastChannel	Yes (Chrome 54+)	Full support. Used for cross-tab VectorDB sync.
Chrome Built-in AI	Chrome 138+ (Summarizer/Translator stable, Prompt API stable for extensions since 138, web pages since 148)	@localmode/chrome-ai for zero-download summarization and translation via Gemini Nano.

Understanding the Impact

Each feature in the matrix above maps to specific LocalMode capabilities:

WebGPU - Required for @localmode/webllm (GPU-accelerated LLM inference at 30-90 tokens/second). When unavailable, use @localmode/wllama (WASM, 5-20 tokens/second) as a fallback. Non-LLM tasks (embeddings, classification, vision, audio) do not require WebGPU.
WebAssembly - The universal inference backend. Required for @localmode/transformers and @localmode/wllama. WASM is supported in 97%+ of web traffic. SIMD support (for optimized vector operations) requires newer browser versions.
IndexedDB - Used for persistent vector storage (VectorDB) and model caching (createModelLoader). When blocked (Safari Private Browsing), LocalMode falls back to MemoryStorage (data lost on tab close).
Web Workers - Enable background model loading and inference without blocking the main UI thread. Module workers (for ES module imports in workers) require newer browser versions.
SharedArrayBuffer - Enables multi-threaded WASM inference for improved performance. Requires Cross-Origin Isolation headers (COOP/COEP). Not required for basic functionality.
Web Locks - Used for cross-tab model loading coordination (prevents multiple tabs from downloading the same model simultaneously). Falls back to InMemoryLockManager when unavailable.
BroadcastChannel - Used for cross-tab VectorDB synchronization. Falls back to LocalStorageBroadcaster when unavailable.

Fallback Strategies

Chrome Desktop supports all LocalMode features. No fallbacks needed. This is the recommended development target - if your app works in Chrome Desktop, you've covered the majority of your user base. The main consideration is WebGPU availability: Chrome 113+ has full support, but users on Chrome 80-112 will fall back to WASM-based inference (wllama, Transformers.js).

LocalMode is designed with progressive enhancement in mind. The core principle: detect capabilities at runtime and use the best available path. The @localmode/core package exports detection utilities for this purpose:

import {
  isWebGPUSupported,
  isIndexedDBSupported,
  isCrossOriginIsolated,
  detectCapabilities,
  recommendModels,
} from '@localmode/core';

async function detectAndConfigure() {
  const caps = await detectCapabilities();
  console.log(caps);
  // caps.features.webgpu, caps.hardware.memory (GB), caps.storage.availableBytes

  // isWebGPUSupported() is async - it must be awaited
  if (await isWebGPUSupported()) {
    // Use @localmode/webllm for GPU-accelerated inference
  }

  // recommendModels() is synchronous: capabilities first, options second
  const recommendations = recommendModels(caps, {
    task: 'generation',
    maxSizeMB: 1500,
  });
}

Recommended Providers

For Chrome Desktop, the recommended LocalMode providers are:

WebLLM (WebGPU) - Use when WebGPU is confirmed available. Provides the fastest LLM inference.
wllama (WASM) - Universal LLM inference via WASM. Works without WebGPU. The safe choice for broad compatibility.
Transformers.js - Broadest model catalog for non-LLM tasks (embeddings, classification, vision, audio). WASM-based, works everywhere.
Chrome Built-in AI - Zero model download. Requires Chrome 138+ with flag. Provides summarization, translation, and writing assistance via Gemini Nano.

Recommended Models

The following models are tested and recommended for Chrome Desktop:

Model	Provider
Qwen3-4B-q4f16_1-MLC	WebLLM (WebGPU)
Xenova/bge-small-en-v1.5	Transformers.js
onnx-community/moonshine-base-ONNX	Transformers.js

These models are chosen for their compatibility with Chrome Desktop's capabilities and constraints. They represent the best balance of quality, size, and performance for this platform.

Known Issues

Memory pressure: Chrome may evict IndexedDB data under storage pressure. Use navigator.storage.persist() to request persistent storage. GPU memory: Large WebLLM models (4GB+) may compete with browser rendering for GPU VRAM - monitor for visual glitches on integrated graphics.

Mitigation Strategies

When building applications that target Chrome Desktop, follow these practices:

Always detect before loading - Use await isWebGPUSupported(), isIndexedDBSupported(), and await detectCapabilities() before attempting to load models or create storage. Never assume a feature is available.
Wrap model loading in try/catch - Even when detection succeeds, model loading can fail due to memory pressure, network issues, or browser bugs. Always have a fallback path that attempts a smaller model.
Pick models with recommendModels() - Pass the detected capabilities to recommendModels(caps, { task }) to select a model appropriate for the current device. It is the recommended pattern for production deployments.
Test on real hardware - Browser DevTools device emulation does not accurately simulate memory limits, GPU capabilities, or storage quotas. Test on actual target hardware.
Monitor storage quota - Use getStorageQuota() to check available space before downloading large models. Inform users if storage is insufficient rather than failing silently.

Web Standards References

Webgpu Support - compatibility guide
Chrome Android - compatibility guide
Text Generation - task guide

Methodology

Browser feature support data on this page is sourced from caniuse.com support tables and MDN Web Docs compatibility tables, cross-referenced with LocalMode's runtime feature detection (packages/core/src/capabilities/features.ts). Version numbers reflect the Chrome stable release in which each feature shipped enabled by default: WebGPU (Chrome 113, confirmed via caniuse.com and the official Chrome launch post), WebAssembly (Chrome 57), Web Locks (Chrome 69), BroadcastChannel (Chrome 54). The IndexedDB storage quota (up to 60% of total disk size per origin) is sourced from the MDN Storage quotas and eviction criteria article. Chrome Built-in AI API stable availability is sourced from the Chrome for Developers I/O 2025 announcement. Browser support evolves - verify current support with the linked Sources before making production decisions.

Sources

WebGPU - caniuse.com - Chrome 113+ desktop support confirmed
Chrome ships WebGPU - Chrome for Developers - official Chrome 113 stable announcement
WebGPU API - MDN Web Docs - feature reference and compatibility table
WebAssembly - caniuse.com - Chrome 57+ desktop support confirmed
WebAssembly SIMD - caniuse.com - Chrome 91+ desktop support confirmed
Web Locks API - MDN Web Docs - Chrome 69+ desktop support
Web Locks API - caniuse.com - Chrome 69+ desktop support confirmed
BroadcastChannel - caniuse.com - Chrome 54+ desktop support confirmed
Storage quotas and eviction criteria - MDN Web Docs - Chrome IndexedDB quota: up to 60% of total disk size per origin
AI APIs are in stable and origin trials - Chrome for Developers - Chrome 138: Summarizer, Translator, Language Detector stable; Prompt API stable for extensions
Built-in AI APIs - Chrome for Developers - Chrome Built-in AI overview and per-API availability
IndexedDB API - MDN Web Docs - feature reference

Frequently Asked Questions