Proof, not promises.

During a proof of concept, Invofox delivers a detailed performance report showing exactly how your documents perform — across accuracy, errors, vendors, processing times, precision and real-world edge cases.

Start for free

Performance reports — Accuracy & volume metrics — Invofox

Powering document extraction for teams at

How we turn accuracy into evidence.

From the first documents processed, accuracy is never a vague promise — it's a defined, measured outcome.

Sample + ground truth

You share a small representative sample of your documents with the corresponding ground truth for evaluation.
Define success upfront

We analyze the sample and schema, split it into parts, and set targets: accuracy thresholds, confidence levels and evaluation criteria.
Continuous improvement loop

We iterate to refine the pipeline, adjust models and fine-tune the solution for maximum accuracy.
Process the full volume

We run thousands of documents and pages — not cherry-picked examples — through the production pipeline.
Deliver the performance report

A clear, visual breakdown of results you can confidently share with engineering, ops and exec stakeholders.

A report built for real decisions.

Designed to be shared internally — with technical teams, operators and executives — without additional explanation. Every report includes:

Invofox · Performance Report

8,432 docs 47 vendors Q3 2025

// overall accuracy

99.4% +0.7pp vs Q2

field-level · ground-truth validated

Q1 Q2 Q3

// processing time P50 · P95 · P99 benchmarks

P50

8.0s
P95

10.5s
P99

12.0s

// error distribution Failure types

100%

Missing info 38%
Ambiguous 27%
Low confidence 22%
Edge case 13%

// page-level · document-level · splitter · classifier Performance by granularity

Pages 99.2%
Documents 99.4%
Splits 98.7%
Classify 99.1%

Overall accuracy (field-level and document-level)
Processing time benchmarks (P95 / P99)
Error distribution and failure types
Missing or low-confidence fields
Confidence thresholds and warnings
Page-level and document-level performance
Splitter and classifier performance
Custom analysis relevant to your use case

All results are calculated using a defined evaluation methodology based on ground-truth comparisons — and shared transparently.

Understand performance at the level that matters.

Instead of a single average number, Invofox breaks results down in ways that reflect real production complexity — so your team focuses optimization where it has the greatest impact.

By document type

Invoices, contracts, tax forms (W-9), bills of lading, mortgage and loan applications, credit notes, delivery notes…

95% 100%

Invoices

99.6%
Contracts

97.2%
W-9 forms

99.9%
Bills of lading

98.4%
Credit notes

99.1%

By document source

Identify which layouts represent the most volume or variability — vendors for invoices, jurisdictions for standard forms, etc.

95% 100%

Vendor A · NL

99.4%
Vendor B · US-CA

98.1%
Vendor C · DE

99.6%
Vendor D · ES

97.8%
Vendor E · UK

99.2%

Errors aren't hidden — they're explained.

In real production, a percentage of documents will always present issues: poor image quality, missing data, corrupted files, highly inconsistent layouts. Reaching 100% automation isn't feasible regardless of the model. In high-volume deployments roughly 5–10% of documents fall into this category due to data-quality constraints alone.

5–10% flagged for review in high-volume deployments

Missing information 38%

Required fields not present in the source document.
Ambiguous layouts 27%

Multiple plausible interpretations for the same value.
Low-confidence extractions 22%

Confidence under threshold — flagged for review.
True edge cases 13%

Genuinely unusual cases the model hasn't seen yet.

Each report tells you exactly what failed, why, how often — and where action will have the biggest impact: feedback, threshold tuning, additional data or pipeline adjustments.

Frequently asked questions.

~/invofox / faq.json

// questions 6

1 {

2 ··"question": "What's included in a performance report?",

3

4 ··"answer": "Field-level and document-level accuracy, P95/P99 processing benchmarks, error distribution by failure type, missing or low-confidence fields, confidence thresholds, splitter/classifier performance and any custom analysis specific to your workflow."

5 }

Scope scope.json
1 {

2 ··"question": "How big does the sample need to be?",

3

4 ··"answer": "Enough to be statistically meaningful for your document mix — typically a few hundred documents covering the variety of types, sources and edge cases you actually receive in production."

5 }

Sample sample.json
1 {

2 ··"question": "Can we share the report internally?",

3

4 ··"answer": "Yes — it's specifically designed to be shared with technical teams, ops and executives without additional explanation. Each visual is self-contained."

5 }

Sharing share.json
1 {

2 ··"question": "What if we don't have ground truth?",

3

4 ··"answer": "We help you build it. Often a partial dataset is enough to start, and we iterate the methodology together as the project progresses."

5 }

Ground truth groundtruth.json
1 {

2 ··"question": "How long does the iteration loop take?",

3

4 ··"answer": "It depends on document complexity and target accuracy, but the loop is bounded and visible end-to-end — you see progress between each iteration before signing off."

5 }

Timeline iterate.json
1 {

2 ··"question": "What format does the report come in?",

3

4 ··"answer": "A clean, branded PDF + a live dashboard view if you want to slice the data further. Both ground-truth comparisons and methodology notes are included."

5 }

Format format.json

scope.json

1 {

2 ··"question": "What's included in a performance report?",

4 ··"answer": "Field-level and document-level accuracy, P95/P99 processing benchmarks, error distribution by failure type, missing or low-confidence fields, confidence thresholds, splitter/classifier performance and any custom analysis specific to your workflow."

5 }

Scope scope.json

Still have questions? Talk to us

Transparency you don't have to ask for.

Real documents, real data — a core principle.

Start for free Book a demo

Proof, not promises.

How we turn accuracy into evidence.

Sample + ground truth

Define success upfront

Continuous improvement loop

Process the full volume

Deliver the performance report

A report built for real decisions.

Understand performance at the level that matters.

Errors aren't hidden — they're explained.

Frequently asked questions.

Transparency you don't have to ask for.