LocalMode
LiteRT

Models

The LiteRT catalog ships three verified .litertlm models — Gemma 4 E2B, Gemma 4 E4B, and Qwen3 0.6B — plus instructions for loading gated Gemma models with your own HuggingFace token.

LiteRT Model Catalog

The @localmode/litert package ships a curated catalog (LITERT_MODELS) with three models, all verified to load and generate end-to-end with @litert-lm/core@^0.12.1 in real Chrome.

Gemma 4 E2B and Gemma 4 E4B are the two models Google officially lists as supported by the LiteRT-LM JS API — they use the web-optimized *-it-web.litertlm builds published for browser WebGPU loading. Qwen3 0.6B is a small general .litertlm model included as a lightweight option.

URL formats

Every model can be referenced three ways: the catalog shorthand (e.g. 'gemma-4-E2B'), a HuggingFace repo:file shorthand, or a full URL. Use the latter two — combined with the modelUrl and contextLength overrides — to load .litertlm files outside the catalog, including the gated Gemma models below.

Catalog

IDNameFamilySizeContextBackendNotes
gemma-4-E2BGemma 4 E2BGemma2.0GB8KWebGPU onlyOfficially supported by the LiteRT-LM JS API
gemma-4-E4BGemma 4 E4BGemma3.0GB8KWebGPU onlyOfficially supported by the LiteRT-LM JS API; higher quality
qwen3-0.6BQwen3 0.6BQwen614MB4KWebGPU or CPUSmallest catalog model, fast loading

Multimodal models, text-only API (for now)

The Gemma 4 models are multimodal — their .litertlm files ship vision and audio encoders. But as of @litert-lm/core@0.12.1 (the current version), the LiteRT-LM JavaScript API does not expose those modalities: it accepts the visionModalityEnabled / audioModalityEnabled flags, but enabling either throws Vision/Audio options should not be null because the JS API provides no way to supply the executor options the engine requires (verified by direct testing). So @localmode/litert is text-only for now. Multimodal (image + audio) input may arrive in a future @litert-lm/core release.

Gemma 4 is WebGPU-only

The Gemma 4 *-it-web.litertlm builds are GPU-compiled — their TFLite sections carry a gpu_artisan backend constraint, so they cannot run on the CPU backend. Loading a Gemma 4 model on a browser without WebGPU (or with backend: 'CPU') fails fast with a clear ModelLoadError. Qwen3 0.6B is a portable build that runs on either backend.

Gemma 4 E2B

import { litert } from '@localmode/litert';

const model = litert.languageModel('gemma-4-E2B');
  • Size: 2.0GB (gemma-4-E2B-it-web.litertlm)
  • Context: 8192 tokens
  • License: Gemma
  • Backend: WebGPU only (GPU-compiled build — cannot run on CPU)
  • Status: Verified end-to-end on Chrome 145 with WebGPU (2026-05-20)
  • Best for: The default recommendation — one of the two models Google officially supports for the JS API

Gemma 4 E4B

const model = litert.languageModel('gemma-4-E4B');
  • Size: 3.0GB (gemma-4-E4B-it-web.litertlm)
  • Context: 8192 tokens
  • License: Gemma
  • Backend: WebGPU only (GPU-compiled build — cannot run on CPU)
  • Status: Verified end-to-end on Chrome 145 with WebGPU (2026-05-20)
  • Best for: Higher quality than E2B when the larger download is acceptable

Qwen3 0.6B

const model = litert.languageModel('qwen3-0.6B');
  • Size: 614MB (Qwen3-0.6B.litertlm)
  • Context: 4096 tokens
  • Parameters: ~600M
  • License: Apache-2.0
  • Backend: WebGPU or CPU (portable build — verified on both)
  • Status: Verified end-to-end on Chrome 145 (2026-05-20)
  • Best for: Fast loading, quick prototyping, and the only catalog model that runs on the CPU backend

Gated models — not in catalog

The following .litertlm files exist on HuggingFace but are not in the curated catalog: they live behind a HuggingFace account + a click-through Gemma license, which a browser-side fetch() cannot complete. To use them, accept the Gemma license once on the model's HuggingFace page, mint a User Access Token, and load via the modelUrl override.

NameHuggingFace repo
Gemma 3 1Blitert-community/Gemma3-1B-IT
FunctionGemma 270Mgoogle/functiongemma-270m-litert-lm
Gemma 3n E2B/E4Bgoogle/gemma-3n-E2B-it-litert-lm

Loading pattern

Resolve the gated file yourself with an Authorization header, then hand the resulting URL to litert.languageModel() via modelUrl:

import { litert } from '@localmode/litert';

// 1. Accept the Gemma license on HuggingFace, then mint an Access Token.
const HF_TOKEN = import.meta.env.VITE_HF_TOKEN;

// 2. Fetch with the token so HuggingFace returns a signed redirect URL.
const response = await fetch(
  'https://huggingface.co/litert-community/Gemma3-1B-IT/resolve/main/Gemma3-1B-IT_multi-prefill-seq_q8_ekv2048.litertlm',
  { headers: { Authorization: `Bearer ${HF_TOKEN}` } },
);

// 3. Pass the resolved URL to LiteRT.
const model = litert.languageModel('gemma-3-1B', { modelUrl: response.url });

Don't ship raw tokens to browsers

A Bearer token in client-side code is visible to anyone who opens DevTools. For production, proxy the resolve step through your backend (or a serverless Edge Function) and forward the signed redirect URL to the browser. The model bytes themselves are still downloaded directly by the user — only the token-bearing request is server-side.

Programmatic Access

Access the catalog at runtime via LITERT_MODELS:

import { LITERT_MODELS, getModelCategory } from '@localmode/litert';
import type { LiteRTModelId } from '@localmode/litert';

for (const [id, info] of Object.entries(LITERT_MODELS)) {
  const category = getModelCategory(info.sizeBytes);
  console.log(`[${category}] ${info.name}: ${info.size}`);
}

const modelId: LiteRTModelId = 'gemma-4-E2B';
const entry = LITERT_MODELS[modelId];
console.log(entry.url); // HuggingFace .litertlm URL

The LiteRTModelEntry shape:

interface LiteRTModelEntry {
  name: string;
  contextLength: number;
  sizeBytes: number;
  size: string;           // e.g. '2.0GB'
  description: string;
  url: string;            // HuggingFace .litertlm URL
  parameterCount: number;
  requiresWebGPU?: boolean; // true = GPU-compiled build, cannot run on CPU backend
}

For per-instance overrides used when loading non-catalog models, see LiteRTModelSettings — notably modelUrl and contextLength — documented on the Overview page.

Next Steps

On this page