What AI features can run in the browser without an API key?

At least 17 production-ready features including semantic search, sentiment analysis, text summarization, language translation, image captioning, object detection, OCR, voice transcription, text-to-speech, and cross-modal image search. Each uses an open-weight model that downloads once and runs via WebAssembly or WebGPU.

How large are the AI models needed for browser-based inference?

Individual models range from 8 MB (photo enhancement) to 284 MB (text summarization). Most apps use 2-4 models totaling 100-500 MB. Models are quantized ONNX weights cached in IndexedDB after the first download, so subsequent loads are instant and work offline.

How does local AI quality compare to cloud APIs like OpenAI?

For embeddings, local models score within 0.1 points of OpenAI on the MTEB benchmark (62.2 vs 62.3). Named entity recognition reaches 93-96% of GPT-5, and zero-shot classification hits 92-95%. Summarization and translation show larger gaps at roughly 80-85% of cloud quality.

17 AI Features You Can Add to Your App Without an API Key

Q: What packages do I need to add AI features to a JavaScript app?

Two packages: @localmode/core provides the functions (embed, classify, transcribe, etc.) with zero dependencies, and @localmode/transformers provides the HuggingFace Transformers.js model implementations. For React apps, add @localmode/react for hooks that handle loading states and cancellation.

Every AI feature you add to your app usually means another API key to manage, another vendor to trust with your users' data, and another line item on your monthly bill that scales with success.

But modern browsers are powerful enough to run real ML models - the same transformer architectures behind cloud APIs - directly on the user's device. No servers. No API keys. No per-request costs. Data never leaves the browser tab.

This is not a theoretical exercise. Below are 17 features you can ship today, each with a working code snippet using real function signatures from LocalMode, the model that powers it, and a link to a live demo running at localmode.ai.

1. Semantic Search

Find documents by meaning, not just keywords. Embed text into vectors and search by cosine similarity. Users can search "budget concerns" and find a note titled "Q3 financial projections" - because the model understands they are related.

import { embed, createVectorDB } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.embedding('Xenova/bge-small-en-v1.5');
const db = await createVectorDB({ name: 'notes', dimensions: 384 });

const { embedding } = await embed({ model, value: 'budget concerns' });
const results = await db.search(embedding, { k: 5 });

Model: Xenova/bge-small-en-v1.5 (33 MB) | Try the live demo

2. Sentiment Analysis

Classify customer reviews, support tickets, or social mentions as positive or negative in real time. Batch-process thousands of texts without sending a single request to any external service.

import { classify } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { label, score } = await classify({
  model: transformers.classifier('Xenova/distilbert-base-uncased-finetuned-sst-2-english'),
  text: 'This product exceeded my expectations!',
});
// label: "POSITIVE", score: 0.9998

Model: Xenova/distilbert-base-uncased-finetuned-sst-2-english (67 MB) | Try the live demo

3. Text Summarization

Condense long articles, meeting notes, or support threads into key points. Control output length with maxLength and minLength parameters.

import { summarize } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { summary } = await summarize({
  model: transformers.summarizer('Xenova/distilbart-cnn-6-6'),
  text: longArticle,
  maxLength: 100,
  minLength: 30,
});

Model: Xenova/distilbart-cnn-6-6 (~284 MB quantized) | Try the live demo

4. Language Translation

Translate text between 20+ language pairs completely offline. Helsinki-NLP's OPUS-MT models cover the major European, Asian, and Romance languages, each as a compact per-pair download.

import { translate } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { translation } = await translate({
  model: transformers.translator('Xenova/opus-mt-en-de'),
  text: 'Hello, how are you?',
  targetLanguage: 'de',
});
// translation: "Hallo, wie geht es dir?"

Model: Xenova/opus-mt-en-* (~100 MB per pair) | Try the live demo

5. Image Captioning

Generate natural language descriptions of images automatically. Useful for accessibility alt-text, content moderation pipelines, or building searchable image libraries.

import { captionImage } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { caption } = await captionImage({
  model: transformers.captioner('onnx-community/Florence-2-base-ft'),
  image: imageBlob,
});
// caption: "a golden retriever playing with a ball in a park"

Model: onnx-community/Florence-2-base-ft (223 MB) | Try the live demo

6. Object Detection

Locate and label objects in images with bounding boxes. D-FINE achieves strong accuracy at a fraction of the size of YOLO-family models, and runs well on WebGPU-enabled browsers.

import { detectObjects } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { objects } = await detectObjects({
  model: transformers.objectDetector('onnx-community/dfine_n_coco-ONNX'),
  image: imageBlob,
  threshold: 0.7,
});

for (const obj of objects) {
  console.log(`${obj.label} at (${obj.box.x}, ${obj.box.y}): ${(obj.score * 100).toFixed(1)}%`);
}

Model: onnx-community/dfine_n_coco-ONNX (15 MB) | Try the live demo

7. OCR (Optical Character Recognition)

Extract text from photos, scanned documents, and screenshots. TrOCR handles both printed and handwritten text, making it suitable for receipt scanning, form digitization, and note capture.

import { extractText } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { text } = await extractText({
  model: transformers.ocr('Xenova/trocr-small-printed'),
  image: scannedDocument,
});

Model: Xenova/trocr-small-printed (~64 MB quantized) | Try the live demo

8. Document Redaction (PII Detection)

Detect and redact personally identifiable information before it reaches storage or embeddings. Combine regex-based pattern matching for structured PII (emails, SSNs, credit cards) with NER for names and organizations.

import { redactPII } from '@localmode/core';

const redacted = redactPII(
  'Contact John Smith at john@example.com or 555-123-4567',
  { emails: true, phones: true }
);
// redacted: "Contact John Smith at [EMAIL_REDACTED] or [PHONE_REDACTED]"

Model: Pattern-based (0 MB, zero-dependency) + optional NER (108 MB) | Try the live demo

9. Voice Transcription

Convert audio recordings to text using Moonshine, a lightweight speech recognition model optimized for browser inference. Supports timestamps for subtitle generation and works across accents.

import { transcribe } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { text, segments } = await transcribe({
  model: transformers.speechToText('onnx-community/moonshine-tiny-ONNX'),
  audio: audioBlob,
  returnTimestamps: true,
});

segments?.forEach(seg => {
  console.log(`[${seg.start}s - ${seg.end}s] ${seg.text}`);
});

Model: onnx-community/moonshine-tiny-ONNX (28 MB) | Try the live demo

10. Text-to-Speech

Generate natural-sounding speech from text entirely in the browser. Create audiobooks, accessibility features, or voice interfaces without sending text to any external service.

import { synthesizeSpeech } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { audio, sampleRate } = await synthesizeSpeech({
  model: transformers.textToSpeech('Xenova/mms-tts-eng'),
  text: 'Welcome to the future of local AI.',
});

Model: Xenova/mms-tts-eng (~38 MB quantized) | Try the live demo

11. Smart Autocomplete

Predict the most likely word to fill a gap in text using masked language models. This powers writing assistants, search suggestions, and form auto-fill - all without sending keystrokes to a server.

import { fillMask } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { predictions } = await fillMask({
  model: transformers.fillMask('onnx-community/ModernBERT-base-ONNX'),
  text: 'The capital of France is [MASK].',
  topK: 5,
});
// predictions[0].token: "paris", predictions[0].score: 0.95

Model: onnx-community/ModernBERT-base-ONNX (151 MB) | Try the live demo

12. Email / Intent Classification

Classify emails, support tickets, or any text into custom categories - without training a model. Zero-shot classification lets you define labels at runtime and the model figures out which ones fit.

import { classifyZeroShot } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { labels, scores } = await classifyZeroShot({
  model: transformers.zeroShot('Xenova/mobilebert-uncased-mnli'),
  text: 'I need to reset my password and update billing info',
  candidateLabels: ['account access', 'billing', 'technical support', 'feedback'],
});
// labels: ["account access", "billing", ...], scores: [0.72, 0.68, ...]

Model: Xenova/mobilebert-uncased-mnli (27 MB) | Try the live demo

13. Named Entity Recognition

Extract people, organizations, locations, and other entities from unstructured text. NER is the backbone of document understanding, knowledge graph construction, and the PII detection pipeline in feature #8.

import { extractEntities } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { entities } = await extractEntities({
  model: transformers.ner('Xenova/bert-base-NER'),
  text: 'Tim Cook announced that Apple will open a new office in Berlin.',
});
// entities: [
//   { text: "Tim Cook", type: "PERSON", score: 0.99 },
//   { text: "Apple", type: "ORG", score: 0.98 },
//   { text: "Berlin", type: "LOC", score: 0.97 }
// ]

Model: Xenova/bert-base-NER (108 MB) | Try the live demo

14. Question Answering

Given a passage of text and a question, extract the precise answer span with a confidence score. Ideal for FAQ bots, documentation search, and customer support - no LLM required.

import { answerQuestion } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { answer, score } = await answerQuestion({
  model: transformers.questionAnswering('Xenova/distilbert-base-cased-distilled-squad'),
  question: 'What is the capital of France?',
  context: 'France is a country in Europe. Its capital is Paris, known for the Eiffel Tower.',
});
// answer: "Paris", score: 0.98

Model: Xenova/distilbert-base-cased-distilled-squad (~66 MB quantized) | Try the live demo

15. Background Removal

Segment foreground objects from the background and export transparent PNGs. RMBG-1.4 handles complex scenes, hair details, and semi-transparent objects with surprising accuracy for its size.

import { segmentImage } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { masks } = await segmentImage({
  model: transformers.segmenter('briaai/RMBG-1.4'),
  image: photoBlob,
});

Model: briaai/RMBG-1.4 (176 MB) | Try the live demo

16. Photo Enhancement (Super Resolution)

Upscale low-resolution images by 2x or 4x using neural super resolution. Restore old photos, enhance thumbnails, or improve screenshots - all processed locally without uploading images anywhere.

import { imageToImage } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const { image } = await imageToImage({
  model: transformers.imageToImage('Xenova/swin2SR-lightweight-x2-64'),
  image: lowResPhoto,
  scale: 2,
});

Model: Xenova/swin2SR-lightweight-x2-64 (~8 MB) | Try the live demo

Search photos by typing a text description, or find visually similar images by uploading a reference. CLIP embeds text and images into the same vector space, enabling true cross-modal retrieval.

import { embed, embedImage, createVectorDB } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const model = transformers.multimodalEmbedding('Xenova/clip-vit-base-patch32');

// Index photos by their visual content
const { embedding: imgVec } = await embedImage({ model, image: photoBlob });
await db.add({ id: 'photo-1', vector: imgVec });

// Search with text
const { embedding: queryVec } = await embed({ model, value: 'sunset over the ocean' });
const results = await db.search(queryVec, { k: 10 });

Model: Xenova/clip-vit-base-patch32 (~154 MB quantized) | Try the live demo

Summary Table

#	Feature	Function	Model	Size (quantized)	Quality vs Cloud
1	Semantic Search	`embed()`	`Xenova/bge-small-en-v1.5`	33 MB	~99% of OpenAI
2	Sentiment Analysis	`classify()`	`Xenova/distilbert-base-uncased-finetuned-sst-2-english`	67 MB	strong for focused sentiment tasks
3	Text Summarization	`summarize()`	`Xenova/distilbart-cnn-6-6`	~284 MB	~80% of GPT-5 (ROUGE-2: 20.17)
4	Language Translation	`translate()`	`Xenova/opus-mt-en-*`	~100 MB per pair	~85% of Google Translate (BLEU)
5	Image Captioning	`captionImage()`	`onnx-community/Florence-2-base-ft`	223 MB	55-65% of GPT-5
6	Object Detection	`detectObjects()`	`onnx-community/dfine_n_coco-ONNX`	15 MB	70-80% of cloud vision APIs
7	OCR	`extractText()`	`Xenova/trocr-small-printed`	~64 MB	strong for printed text
8	Document Redaction	`redactPII()`	Pattern-based + NER	0-108 MB	strong for structured PII
9	Voice Transcription	`transcribe()`	`onnx-community/moonshine-tiny-ONNX`	28 MB	~80% of Whisper API (WER)
10	Text-to-Speech	`synthesizeSpeech()`	`Xenova/mms-tts-eng`	~38 MB	strong for TTS tasks
11	Smart Autocomplete	`fillMask()`	`onnx-community/ModernBERT-base-ONNX`	151 MB	strong for fill-mask tasks
12	Email Classification	`classifyZeroShot()`	`Xenova/mobilebert-uncased-mnli`	27 MB	92-95% of GPT-5 (MNLI)
13	Named Entity Recognition	`extractEntities()`	`Xenova/bert-base-NER`	108 MB	93-96% of GPT-5
14	Question Answering	`answerQuestion()`	`Xenova/distilbert-base-cased-distilled-squad`	~66 MB	90-93% of GPT-5 (SQuAD F1)
15	Background Removal	`segmentImage()`	`briaai/RMBG-1.4`	176 MB	strong for foreground segmentation
16	Photo Enhancement	`imageToImage()`	`Xenova/swin2SR-lightweight-x2-64`	~8 MB	strong for structured upscaling
17	Cross-Modal Search	`embedImage()`	`Xenova/clip-vit-base-patch32`	~154 MB	strong for cross-modal retrieval

Total size if you use every feature: under 2 GB quantized. In practice, most apps use 2-4 models that together weigh 100-500 MB, cached in IndexedDB after the first download, and available offline forever.

What All 17 Features Have in Common

No API key. You npm install a package, import a function, and call it. There is no key provisioning, no environment variables, no billing dashboard.

No server. Models run in the browser via WebAssembly and WebGPU. Your backend never sees the data, which means you never have to worry about data residency, GDPR consent flows for third-party processors, or the liability of storing user content on your infrastructure.

No recurring cost. Cloud AI pricing is per-request. Local AI pricing is per-download - and the download is cached. Your thousandth user costs the same as your first: zero.

No latency penalty for simple tasks. Embedding a sentence takes 5-15ms locally. A cloud round-trip to do the same thing takes 100-300ms including network overhead. For interactive features like search-as-you-type, local inference is not just cheaper - it is faster.

When to use cloud instead

Local models are smaller than cloud models. For tasks requiring broad world knowledge (complex multi-step reasoning, creative writing, code generation), a 1-4 GB local LLM will not match GPT-5 or Claude. Use local AI for the focused, high-volume tasks in this list. Use cloud AI for the open-ended tasks where quality is paramount and latency is acceptable.

Getting Started

Every snippet above uses two packages:

npm install @localmode/core @localmode/transformers

@localmode/core provides the functions (embed, classify, transcribe, etc.) with zero dependencies. @localmode/transformers provides the HuggingFace Transformers.js model implementations. The architecture is interface-based - you can swap providers without changing application code.

For React applications, add @localmode/react for hooks that handle loading states, cancellation, and error boundaries:

npm install @localmode/react

Models are downloaded from HuggingFace Hub on first use and cached in IndexedDB. Subsequent loads are instant and work fully offline.

Methodology

All function signatures, model IDs, and code snippets were verified against the LocalMode source code at packages/core/src/ and packages/transformers/src/. Every function (embed, classify, summarize, etc.) and every provider factory (transformers.embedding(), transformers.classifier(), etc.) is a real exported API. The SearchOptions type uses k (not topK) for result count - this is enforced by the TypeScript interface in packages/core/src/types.ts.

Model sizes reflect the quantized ONNX weights served by Transformers.js - specifically the model_quantized.onnx or model_int8.onnx variant, or the sum of encoder and decoder quantized components for seq2seq models - as listed in the HuggingFace model repositories (verified May 2026). Quality comparisons marked with percentages are sourced from our Local AI vs. Cloud benchmark post; rows marked "strong for X tasks" reflect well-established model capabilities without a specific percentage claim.

Sources

Xenova/bge-small-en-v1.5 ONNX files - model_int8.onnx: 33.8 MB
Xenova/distilbert-base-uncased-finetuned-sst-2-english ONNX files - model_int8.onnx: 67.4 MB
Xenova/distilbart-cnn-6-6 ONNX files - encoder_quantized: 129 MB + decoder_merged_quantized: 155 MB
onnx-community/Florence-2-base-ft ONNX files - combined quantized components: 223 MB (per LocalMode benchmark)
onnx-community/dfine_n_coco-ONNX files - model.onnx: 15.3 MB
Xenova/trocr-small-printed ONNX files - encoder_quantized 23.1 MB + decoder_merged_quantized 40.5 MB = ~64 MB
onnx-community/moonshine-tiny-ONNX files - quantized components: 28 MB (per LocalMode benchmark)
Xenova/mms-tts-eng ONNX files - model_quantized.onnx: 38.4 MB
onnx-community/ModernBERT-base-ONNX files - model_int8.onnx: 151 MB
Xenova/mobilebert-uncased-mnli ONNX files - model_int8.onnx: 26.4 MB
Xenova/bert-base-NER ONNX files - model_int8.onnx: 108 MB
Xenova/distilbert-base-cased-distilled-squad ONNX files - model_int8.onnx: 65.6 MB
briaai/RMBG-1.4 ONNX files - model.onnx: 176 MB
Xenova/swin2SR-lightweight-x2-64 ONNX files - model.onnx: 8.08 MB
Xenova/clip-vit-base-patch32 ONNX files - vision_model_quantized 89.1 MB + text_model_quantized 64.5 MB = ~154 MB
LocalMode: Local AI vs. Cloud benchmark - source for quality vs. cloud comparisons (MTEB, SQuAD F1, BLEU, WER, MNLI)

Try it yourself

Visit localmode.ai to try 30+ AI demo apps running entirely in your browser. No sign-up, no API keys, no data leaves your device.

Read the Getting Started guide to add local AI to your application in under 5 minutes.

Frequently Asked Questions