How Invofox measures accuracy in document extraction.

We evaluate, normalize and validate data extraction accuracy across millions of documents — using a transparent, ground-truth benchmarking process built for real-world evals.

Start for free

Per-field

99.4%

Field-level accuracy

Per-doc

99.2%

Document-level accuracy

Volume

12.4M

Documents evaluated

+18% MoM

Quality

False positives

SLA bound

Powering document extraction for teams at

Ground truth: the starting point for measuring accuracy.

Accurate benchmarking starts with an accurate baseline — what we call the ground truth. It defines the correct data for every field, so we can measure extraction accuracy objectively across your pipeline. When a customer shares their labeled data, we use it as the standard reference.

Document

Ground truth

Comparison

Model output

1{
2  "document_number": "INV-2024-1837",
3  "issued_at": "2024-08-14",
4  "tax_base": 1452.30,
5  "vat_rate": 0.21,
6  "total": 1757.28
7}

Ground truth

1{
2  "document_number": "INV-2024-1873",
3  "issued_at": "2024-08-14",
4  "tax_base": 1452.40,
5  "vat_rate": 0.21,
6  "total": 1757.28
7}

How Invofox handles complex data in accuracy evaluation.

Document data rarely looks identical, even when it's correct. Our evaluation logic adapts to each data type — so comparisons remain fair and consistent across the board.

numbers

Numbers

1.234,56 1,234.56

Tolerance ±0.01

Compared within tolerance ranges. Trailing-zero, separator and currency-format mismatches don't break the match.

dates

Dates

14/08/24 2024-08-14

ISO 8601

Normalized to ISO 8601. Time-zone differences and locale formats are reconciled automatically.

booleans

Booleans

— false

Unchecked ⇒ false

Account for missing or unchecked states. "—" is treated as false unless the schema forces otherwise.

arrays

Arrays & tables

[A, B, C] [C, A, B]

Order-agnostic

Evaluated by content, not order — unless the order is business-critical for the use case.

strings

Texts & strings

INV-1873 INV-1873 Exact
Acme Co. ACME CO Normalized
C/. Mayor 1 Calle Mayor 1 Levenshtein 0.92

Exact, normalized and similarity-based (Levenshtein) matching depending on the field type.

From field accuracy to full-document reliability.

Most IDP vendors only report field-level accuracy. Invofox goes further by measuring document-level accuracy too — because even a single wrong field can stop an automated workflow. We compute both: per-field for granular analytics, per-document for end-to-end reliability, plus custom validation rules per use case.

Start for free

Per-document 99.0%

End-to-end reliability

Every field must be correct for a document to count. The signal a downstream workflow can actually trust.

Per-field 99.5%

Granular analytics

Field-level precision and recall across millions of extracted keys. Perfect for monitoring and dashboards.

Want to see how your vendor compares?

Benchmark your current IDP or in-house system against Invofox — we'll show you the data side by side.

Start for free

Benchmark consistency when your schema evolves.

Adding or removing a field can make old benchmarks impossible to compare. Invofox tracks schema versions and normalizes changes automatically — so your accuracy results stay valid over time. When new keys appear, we flag affected documents so you keep clear visibility into your evolving data model.

v1.0 Jan 4 fields

document_number
issued_at
tax_base
total

v1.1 Mar 5 fields

document_number
issued_at
tax_base
total
currency

+1 field added Compatible

v2.0 Jun 5 fields

document_number
issue_date
tax_base
total
currency

1 field renamed Normalized

Accuracy evaluation built on transparency.

We believe accuracy metrics should be verifiable, not subjective. Every eval runs in-house with consistent parameters and transparent rules. Each customer receives both summary metrics and the raw data used to calculate them — no black boxes, no hidden assumptions.

Client Invofox

Document number

89.4%

False positives 5.4%

99.3%

False positives 0.0%

Tax base amount

87.9%

False positives 3.8%

98.8%

False positives 0.0%

OrderRef

88.7%

False positives 6.2%

99.1%

False positives 0.0%

Frequently asked questions about accuracy evaluation.

~/invofox / faq.json

// questions 4

1 {

2 ··"question": "What happens if we change our schema mid-test?",

3

4 ··"answer": "Invofox tracks schema versions automatically, aligning field definitions across updates so results remain comparable. You'll always know whether changes come from real performance gains or schema adjustments."

5 }

Accuracy schema.json
1 {

2 ··"question": "Can we access the raw benchmark results?",

3

4 ··"answer": "Yes. Every customer receives the full report plus all raw outputs. You can verify or replicate our results anytime for complete transparency."

5 }

Reports raw.json
1 {

2 ··"question": "How do you ensure fairness when comparing vendors?",

3

4 ··"answer": "Both systems process the same documents under identical rules and thresholds. Invofox then shares all results side by side for objective comparison."

5 }

Accuracy fairness.json
1 {

2 ··"question": "What industries or document types is this best for?",

3

4 ··"answer": "Our process is designed for any high-volume, accuracy-critical workflow — including invoices, bank statements, mortgage packages, insurance forms and more."

5 }

Documents industries.json

schema.json

1 {

2 ··"question": "What happens if we change our schema mid-test?",

4 ··"answer": "Invofox tracks schema versions automatically, aligning field definitions across updates so results remain comparable. You'll always know whether changes come from real performance gains or schema adjustments."

5 }

Accuracy schema.json

Still have questions? Talk to us

Ready to see how Invofox measures accuracy?

Get a transparent benchmark on your own documents — same rules, same thresholds, full visibility.

Start for free Book a demo