LocalMode
Transformers

OCR

Extract text from images using TrOCR, GLM-OCR, and LightOnOCR-2 models.

Extract text from images using OCR models — from lightweight line-level TrOCR to document-level vision-language models like GLM-OCR and LightOnOCR-2. Works with handwritten text, printed documents, tables, formulas, and screenshots.

For full API reference (extractText(), extractTextMany(), options, result types, and custom providers), see the Core OCR guide.

See it in action

Try OCR Scanner for a working demo with model selection.

ModelSizeTypeStrengthUse Case
Xenova/trocr-small-printed~120MBLine-levelPrinted textQuick extraction, screenshots (recommended for getting started)
Xenova/trocr-small-handwritten~120MBLine-levelHandwritten textNotes, forms, handwriting
onnx-community/GLM-OCR-ONNX~652MBDocument-levelTables, formulas, structured dataComplex documents, invoices, receipts
onnx-community/LightOnOCR-2-1B-ONNX~700MBDocument-levelSpeed, 11 languagesHigh-throughput document scanning

Line-level vs Document-level

TrOCR models process one line of text at a time (~120MB, fast). GLM-OCR and LightOnOCR-2 are vision-language models that process full document images (~650-700MB, WebGPU recommended) with support for tables, formulas, and structured extraction via prompt-based modes.

Document Scanner Example

Based on the OCR Scanner showcase app:

import { transformers } from '@localmode/transformers';
import { extractText } from '@localmode/core';

// Line-level OCR (TrOCR)
const model = transformers.ocr('Xenova/trocr-small-printed');

const { text } = await extractText({
  model,
  image: dataUrl,
});

Generative OCR (GLM-OCR / LightOnOCR-2)

// Document-level OCR with prompt-based modes
const model = transformers.ocr('onnx-community/GLM-OCR-ONNX');

// Text extraction (default)
const { text } = await extractText({ model, image: documentImage });

// Table recognition
const { text: table } = await extractText({
  model,
  image: tableImage,
  prompt: 'Table Recognition:',
});

// Formula recognition
const { text: formula } = await extractText({
  model,
  image: mathImage,
  prompt: 'Formula Recognition:',
});

Image Input Formats

The image parameter accepts:

  • string — Data URL (data:image/jpeg;base64,...)
  • Blob — Image blob from file input

Best Practices

OCR Tips

  1. Choose the right model — TrOCR for quick single-line extraction, GLM-OCR/LightOnOCR-2 for full documents
  2. Image quality matters — Clear, well-lit images give much better results
  3. Use prompt modes — Generative models support 'Text Recognition:', 'Table Recognition:', and 'Formula Recognition:' prompts
  4. WebGPU for generative models — GLM-OCR and LightOnOCR-2 run significantly faster with WebGPU; WASM works but is slower

For full PDF text extraction, use @localmode/pdfjs instead — it extracts text directly from PDF structure without OCR. Use OCR only for scanned documents or images of text.

Showcase Apps

AppDescriptionLinks
OCR ScannerExtract text from images and scanned documentsDemo · Source

Next Steps

On this page