Skip to content New Introducing our Perfect Docs Guaranteed offer — 99%+ accuracy for high-volume teams. Limited spots available. Learn more

How Invofox measures accuracy in document extraction.

We evaluate, normalize and validate data extraction accuracy across millions of documents — using a transparent, ground-truth benchmarking process built for real-world evals.

Per-field
99.4%
Field-level accuracy
Per-doc
99.2%
Document-level accuracy
Volume
12.4M
Documents evaluated
+18% MoM
Quality
0%
False positives
SLA bound

Powering document extraction for teams at

Ground truth: the starting point for measuring accuracy.

Accurate benchmarking starts with an accurate baseline — what we call the ground truth. It defines the correct data for every field, so we can measure extraction accuracy objectively across your pipeline. When a customer shares their labeled data, we use it as the standard reference.

Document
Ground truth
Comparison
Model output
1{
2  "document_number": "INV-2024-1837",
3  "issued_at": "2024-08-14",
4  "tax_base": 1452.30,
5  "vat_rate": 0.21,
6  "total": 1757.28
7}
Ground truth
1{
2  "document_number": "INV-2024-1873",
3  "issued_at": "2024-08-14",
4  "tax_base": 1452.40,
5  "vat_rate": 0.21,
6  "total": 1757.28
7}

How Invofox handles complex data in accuracy evaluation.

Document data rarely looks identical, even when it's correct. Our evaluation logic adapts to each data type — so comparisons remain fair and consistent across the board.

numbers

Numbers

1.234,56 1,234.56
Tolerance ±0.01

Compared within tolerance ranges. Trailing-zero, separator and currency-format mismatches don't break the match.

dates

Dates

14/08/24 2024-08-14
ISO 8601

Normalized to ISO 8601. Time-zone differences and locale formats are reconciled automatically.

booleans

Booleans

false
Unchecked ⇒ false

Account for missing or unchecked states. "—" is treated as false unless the schema forces otherwise.

arrays

Arrays & tables

[A, B, C] [C, A, B]
Order-agnostic

Evaluated by content, not order — unless the order is business-critical for the use case.

strings

Texts & strings

  • INV-1873 INV-1873 Exact
  • Acme Co. ACME CO Normalized
  • C/. Mayor 1 Calle Mayor 1 Levenshtein 0.92

Exact, normalized and similarity-based (Levenshtein) matching depending on the field type.

From field accuracy to full-document reliability.

Most IDP vendors only report field-level accuracy. Invofox goes further by measuring document-level accuracy too — because even a single wrong field can stop an automated workflow. We compute both: per-field for granular analytics, per-document for end-to-end reliability, plus custom validation rules per use case.

Per-document 99.0%

End-to-end reliability

Every field must be correct for a document to count. The signal a downstream workflow can actually trust.

Whole-document accuracy across the pipeline
Per-field 99.5%

Granular analytics

Field-level precision and recall across millions of extracted keys. Perfect for monitoring and dashboards.

Per-key precision & recall, contractually committed

Want to see how your vendor compares?

Benchmark your current IDP or in-house system against Invofox — we'll show you the data side by side.

Benchmark consistency when your schema evolves.

Adding or removing a field can make old benchmarks impossible to compare. Invofox tracks schema versions and normalizes changes automatically — so your accuracy results stay valid over time. When new keys appear, we flag affected documents so you keep clear visibility into your evolving data model.

v1.0 Jan 4 fields
  • document_number
  • issued_at
  • tax_base
  • total
v1.1 Mar 5 fields
  • document_number
  • issued_at
  • tax_base
  • total
  • currency
+1 field added Compatible
v2.0 Jun 5 fields
  • document_number
  • issue_date
  • tax_base
  • total
  • currency
1 field renamed Normalized

Accuracy evaluation built on transparency.

We believe accuracy metrics should be verifiable, not subjective. Every eval runs in-house with consistent parameters and transparent rules. Each customer receives both summary metrics and the raw data used to calculate them — no black boxes, no hidden assumptions.

Client Invofox
Document number
89.4%
False positives 5.4%
99.3%
False positives 0.0%
Tax base amount
87.9%
False positives 3.8%
98.8%
False positives 0.0%
OrderRef
88.7%
False positives 6.2%
99.1%
False positives 0.0%
+10.4pp Avg accuracy gain
−5.1pp False positives reduced
3 Fields measured

Frequently asked questions about accuracy evaluation.

~/invofox / faq.json
schema.json
1
2 ··"question" "What happens if we change our schema mid-test?"
3
4 ··"answer" "Invofox tracks schema versions automatically, aligning field definitions across updates so results remain comparable. You'll always know whether changes come from real performance gains or schema adjustments."
5
Accuracy schema.json
main 0 errors 0 warnings UTF-8 LF JSON

Still have questions? Talk to us