Document QA

Answer questions about document images — invoices, forms, receipts — using document understanding models like Florence-2. The model reads the document image and extracts answers without separate OCR.

For full API reference (askDocument(), askTable(), options, result types, and custom providers), see the Core Document QA guide.

See it in action

Try Invoice QA for a working demo.

Recommended Models

Model	Size	Use Case
`onnx-community/Florence-2-base-ft`	~223MB	Document visual Q&A (invoices, forms, receipts)
`Xenova/donut-base-finetuned-docvqa`	~218MB	OCR-free document understanding (legacy)

Invoice QA Example

Based on the Invoice QA showcase app:

import { transformers } from '@localmode/transformers';
import { askDocument } from '@localmode/core';

const model = transformers.documentQA('onnx-community/Florence-2-base-ft');

async function askAboutDocument(imageDataUrl: string, question: string) {
  const { answer, score } = await askDocument({
    model,
    document: imageDataUrl,
    question,
    abortSignal: controller.signal,
  });

  return {
    answer,
    confidence: score,
    isReliable: score > 0.5,
  };
}

// Example questions for invoices:
// - "What is the invoice number?"
// - "What is the total amount?"
// - "What is the date?"
// - "Who is the vendor?"

Table QA

The same documentQA() model also supports question answering on tabular data via askTable():

import { askTable } from '@localmode/core';

const { answer, score, aggregator, cells } = await askTable({
  model,
  table: {
    headers: ['Product', 'Q1 Sales', 'Q2 Sales'],
    rows: [
      ['Widget A', '1200', '1500'],
      ['Widget B', '800', '950'],
      ['Widget C', '2000', '2200'],
    ],
  },
  question: 'Which product had the highest Q2 sales?',
});

console.log(answer);     // 'Widget C'
console.log(aggregator); // e.g., 'NONE' or 'MAX'
console.log(cells);      // ['2200']

Document Input Formats

The document parameter accepts:

string — Data URL of the document image
Blob — Image blob from file input or camera capture

Best Practices

Document QA Tips

Ask specific questions — "What is the total?" works better than "Tell me about this document"
Use clear images — High-resolution scans give much better results
Check confidence — Low scores indicate uncertain answers
One question at a time — Ask focused questions for the best accuracy

Florence-2 is an end-to-end model — it reads the document image directly without needing separate OCR. For text-based Q&A (where you already have the text), use Question Answering instead.

Showcase Apps

App	Description	Links
Invoice QA	Ask questions about invoice and document images	Demo · Source

Document QA

Recommended Models

Invoice QA Example

Table QA

Document Input Formats

Best Practices

Showcase Apps

Next Steps

Core Document QA API

OCR

Question Answering

On this page