| Concept | Period | Quantity | Unit price | Amount |
|---|---|---|---|---|
| Energy consumption | 01–31 May 2026 | 412 kWh | £0.1512 / kWh | £62.30 |
| Standing charge | 31 days | 31 days | £0.5968 / day | £18.50 |
| Climate Change Levy | 01–31 May 2026 | 412 kWh | £0.0109 / kWh | £4.50 |
| VAT (20%) | — | — | — | £17.05 |
Utility Bill OCR for AP, ESG and every team that needs the data .
Extract consumption, reading types, time-of-use tariffs, full charge breakdowns and multi-supply records, typed and structured. Any provider, any format, in under 10 seconds.
- 500 free pages included
Powering document extraction for teams at



If your document has data, we can extract it.
Whatever team you're on, we already extract the fields you need from the same utility bill. Read the technical deep dive →
-
Identity platforms and fintechs verify name, address, issue date and provider, but the real challenge is coverage across the long tail of small municipal and co-op providers that generic OCR misses.
-
Installers size systems from kWh consumption history, and the extraction has to flag estimated readings explicitly, or proposals get built on non-representative figures.
-
Energy teams ingest electricity, gas and water across portfolios, a typical 200-site × 15-utility footprint produces thousands of documents per month, every layout and format imaginable.
-
Scope 2 reporting needs measured consumption, not estimated, extraction must surface the reading type per period and supply point, or the carbon report is unreliable.
-
Property managers process bills across hundreds of units, different account formats, billing cycles and meter conventions per utility make manual reconciliation simply not viable.
-
Finance teams post to ERPs and audit charges against contracted tariff terms, which requires the full hierarchical breakdown, not just the total amount due.
{ "tip": "Select a use case", "to_see": "extracted JSON" }
Invofox vs. generic OCR and LLM pipelines.
Invoice parsers, form extractors, and LLM prompts produce partial output on utility documents. Here's what changes when the pipeline is built for them. Read the full breakdown →
Invofox
Trained on utility bills end to end.
- Utility-specific model Yes
- Reading types (measured / estimated)
- Multi-supply PDF splitting Built-in
- Custom schema fields Config only
- Multi-supply JSON structure Supply points + charges nested
- Time-of-use tariff periods
- Feedback loop Automatic
- Roadmap stability Stable
- Maintenance burden Low
Generic OCR & LLMs
Invoice parsers, form extractors, or a prompt sent to an LLM.
- Utility-specific model No, generic invoice/form layer
- Reading types (measured / estimated)
- Multi-supply PDF splitting Custom build / extra processor
- Custom schema fields Retraining or app code
- Multi-supply JSON structure Flat JSON per page
- Time-of-use tariff periods
- Feedback loop Manual retraining
- Roadmap stability Mixed, some processors retiring
- Maintenance burden Medium to high
Hierarchical, not flat. Built for the bill's real structure.
Utility bills aren't flat. One PDF can carry multiple supply points, each with their own period, consumption, reading type and charge breakdown. Our schema mirrors that.
-
Document 4 fields - invoice_number UB-2026-05-1873 string
- provider Iberdrola string
- issue_date 2026-06-02 date
- due_date 2026-06-25 date
-
supply_points [ ] 6 fields - cups ES00220001… string
- service_address Av. Diagonal 1… string
- period 2026-05-01 → 31 object
- consumption_kwh 412 number
- reading_type "estimated" enum
- tariff 2.0TD string
-
charges_breakdown [ ] 5 fields - concept "energy" enum
- qty 412 kWh number
- unit_price 0.1268 €/kWh number
- amount 52.30 number
- tariff_band "P1 peak" enum
-
Payment & ESG 5 fields - amount_due 87.45 number
- currency "EUR" string
- iban_last4 "1234" string
- payment_method "direct_debit" enum
- co2eq_kg 53.6 number
From PDF to typed data, every step is utility-aware.
Splitting, classification, extraction, validation and review, each layer of the pipeline is designed for what utility bills actually are: hierarchical, multi-supply, multi-tariff documents.
-
Multi-Supply Identification
Multi-Supply Identification
A single bill can carry electricity and gas on the same page. Each supply point is identified and extracted as a separate record with its own period, consumption, meter reference and charge breakdown. Never collapsed into a single flat object.
-
Utility-Aware Field Extraction
Utility-Aware Field Extraction
Consumption figures, tariff periods, supply-point identifiers and regulatory charge breakdowns are fields that don't exist in a standard invoice model. Generic extractors return amount and due date. This returns the data your pipeline actually needs.
-
Document Classification
Document Classification
Every document is classified before extraction: utility bill or not, and which type: electricity, gas, water or telecom. Mixed-document pipelines get a reliable gate. Anything that isn't a utility bill is flagged upfront, not silently misprocessed.
-
Schema & Arithmetic Validation
Schema & Arithmetic Validation
Every record is validated against your schema rules and the bill's own arithmetic before delivery. Line items are checked against the total. Low-confidence fields are flagged, not silently passed through to your ERP.
-
Automatic Feedback Loop
Automatic Feedback Loop
Operator corrections feed directly back into the model. Accuracy on new providers and layouts improves automatically, with no manual retraining and no engineering work. Most customers reach their target threshold within days of going live.
Frequently asked questions.
Want the full technical deep dive? Read the blog post