Document QA
Answer questions about document images using Florence-2 models.
Answer questions about document images — invoices, forms, receipts — using document understanding models like Florence-2. The model reads the document image and extracts answers without separate OCR.
For full API reference (askDocument(), askTable(), options, result types, and custom providers), see the Core Document QA guide.
See it in action
Try Invoice QA for a working demo.
Recommended Models
| Model | Size | Use Case |
|---|---|---|
onnx-community/Florence-2-base-ft | ~223MB | Document visual Q&A (invoices, forms, receipts) |
Xenova/donut-base-finetuned-docvqa | ~218MB | OCR-free document understanding (legacy) |
Invoice QA Example
Based on the Invoice QA showcase app:
import { transformers } from '@localmode/transformers';
import { askDocument } from '@localmode/core';
const model = transformers.documentQA('onnx-community/Florence-2-base-ft');
async function askAboutDocument(imageDataUrl: string, question: string) {
const { answer, score } = await askDocument({
model,
document: imageDataUrl,
question,
abortSignal: controller.signal,
});
return {
answer,
confidence: score,
isReliable: score > 0.5,
};
}
// Example questions for invoices:
// - "What is the invoice number?"
// - "What is the total amount?"
// - "What is the date?"
// - "Who is the vendor?"Table QA
The same documentQA() model also supports question answering on tabular data via askTable():
import { askTable } from '@localmode/core';
const { answer, score, aggregator, cells } = await askTable({
model,
table: {
headers: ['Product', 'Q1 Sales', 'Q2 Sales'],
rows: [
['Widget A', '1200', '1500'],
['Widget B', '800', '950'],
['Widget C', '2000', '2200'],
],
},
question: 'Which product had the highest Q2 sales?',
});
console.log(answer); // 'Widget C'
console.log(aggregator); // e.g., 'NONE' or 'MAX'
console.log(cells); // ['2200']Document Input Formats
The document parameter accepts:
string— Data URL of the document imageBlob— Image blob from file input or camera capture
Best Practices
Document QA Tips
- Ask specific questions — "What is the total?" works better than "Tell me about this document"
- Use clear images — High-resolution scans give much better results
- Check confidence — Low scores indicate uncertain answers
- One question at a time — Ask focused questions for the best accuracy
Florence-2 is an end-to-end model — it reads the document image directly without needing separate OCR. For text-based Q&A (where you already have the text), use Question Answering instead.