Skip to content New Introducing our Perfect Docs Guaranteed offer — 99%+ accuracy for high-volume teams. Limited spots available. Learn more

Utility Bill OCR for AP, ESG and every team that needs the data .

Extract consumption, reading types, time-of-use tariffs, full charge breakdowns and multi-supply records, typed and structured. Any provider, any format, in under 10 seconds.

  • 500 free pages included
Sample US utility bill from Liberty Power, fictional data
Extracted Data · Utility Bill conf 0.99
1
2 ·· "document_type" "utility_bill"
3 ·· "extracted_at" "2023-10-12T18:45:00Z"
4 ·· "provider" "Liberty Power"
5 ·· "service_type" "residential_electricity"
6 ·· "provider_address" "1200 Energy Way, Dallas, TX 75201"
7 ·· "customer"
8 ···· "name" "Maria S. Ramirez"
9 ···· "service_address" "456 Oak Ave, Austin, TX 78704"
10 ··
11 ·· "account_number" "1234 5678 9012"
12 ·· "statement_date" "2023-10-12"
13 ·· "due_date" "2023-11-05"
14 ·· "period"
15 ···· "from" "2023-09-10"
16 ···· "to" "2023-10-10"
17 ···· "days" 30
18 ··
19 ·· "consumption_kwh" 927
20 ·· "billing_summary"
21 ···· "previous_balance" 110.20
22 ···· "payments_received" -110.20
23 ···· "current_charges" 145.67
24 ···· "total_amount_due" 145.67
25 ··
26 ·· "currency" "USD"
27 ·· "co2eq_kg" 180.4
28

Powering document extraction for teams at

If your document has data, we can extract it.

Whatever team you're on, we already extract the fields you need from the same utility bill. Read the technical deep dive →

Greenfield Energy Business electricity supply
Invoice no. UB-2026-05-1873
Customer ACME Ltd. Company No. 12345678
Supply address 123 Liverpool Street London EC2M 7PY · United Kingdom
Billing period 01 May → 31 May 2026
Issue date 02 Jun 2026
Concept Period Quantity Unit price Amount
Energy consumption 01–31 May 2026 412 kWh £0.1512 / kWh £62.30
Standing charge 31 days 31 days £0.5968 / day £18.50
Climate Change Levy 01–31 May 2026 412 kWh £0.0109 / kWh £4.50
VAT (20%) £17.05
Last 12 months (kWh) Reading: estimated
Total due £102.35
MPAN 1900-0001-1234-5678
Tariff Business Fixed 12m
Carbon footprint 53.6 kg CO₂e
Due date 25 Jun 2026
  • Identity platforms and fintechs verify name, address, issue date and provider, but the real challenge is coverage across the long tail of small municipal and co-op providers that generic OCR misses.

  • Installers size systems from kWh consumption history, and the extraction has to flag estimated readings explicitly, or proposals get built on non-representative figures.

  • Energy teams ingest electricity, gas and water across portfolios, a typical 200-site × 15-utility footprint produces thousands of documents per month, every layout and format imaginable.

  • Scope 2 reporting needs measured consumption, not estimated, extraction must surface the reading type per period and supply point, or the carbon report is unreliable.

  • Property managers process bills across hundreds of units, different account formats, billing cycles and meter conventions per utility make manual reconciliation simply not viable.

  • Finance teams post to ERPs and audit charges against contracted tariff terms, which requires the full hierarchical breakdown, not just the total amount due.

extracted_data.json

  "tip" "Select a use case"
  "to_see" "extracted JSON"

Invofox vs. generic OCR and LLM pipelines.

Invoice parsers, form extractors, and LLM prompts produce partial output on utility documents. Here's what changes when the pipeline is built for them. Read the full breakdown →

Purpose-built

Invofox

Trained on utility bills end to end.

  • Utility-specific model Yes
  • Reading types (measured / estimated)
  • Multi-supply PDF splitting Built-in
  • Custom schema fields Config only
  • Multi-supply JSON structure Supply points + charges nested
  • Time-of-use tariff periods
  • Feedback loop Automatic
  • Roadmap stability Stable
  • Maintenance burden Low
Generic

Generic OCR & LLMs

Invoice parsers, form extractors, or a prompt sent to an LLM.

  • Utility-specific model No, generic invoice/form layer
  • Reading types (measured / estimated)
  • Multi-supply PDF splitting Custom build / extra processor
  • Custom schema fields Retraining or app code
  • Multi-supply JSON structure Flat JSON per page
  • Time-of-use tariff periods
  • Feedback loop Manual retraining
  • Roadmap stability Mixed, some processors retiring
  • Maintenance burden Medium to high

Hierarchical, not flat. Built for the bill's real structure.

Utility bills aren't flat. One PDF can carry multiple supply points, each with their own period, consumption, reading type and charge breakdown. Our schema mirrors that.

utility_bill_schema.json
  • Document 4 fields
    1. invoice_number UB-2026-05-1873 string
    2. provider Iberdrola string
    3. issue_date 2026-06-02 date
    4. due_date 2026-06-25 date
  • supply_points [ ] 6 fields
    1. cups ES00220001… string
    2. service_address Av. Diagonal 1… string
    3. period 2026-05-01 → 31 object
    4. consumption_kwh 412 number
    5. reading_type "estimated" enum
    6. tariff 2.0TD string
  • charges_breakdown [ ] 5 fields
    1. concept "energy" enum
    2. qty 412 kWh number
    3. unit_price 0.1268 €/kWh number
    4. amount 52.30 number
    5. tariff_band "P1 peak" enum
  • Payment & ESG 5 fields
    1. amount_due 87.45 number
    2. currency "EUR" string
    3. iban_last4 "1234" string
    4. payment_method "direct_debit" enum
    5. co2eq_kg 53.6 number

From PDF to typed data, every step is utility-aware.

Splitting, classification, extraction, validation and review, each layer of the pipeline is designed for what utility bills actually are: hierarchical, multi-supply, multi-tariff documents.

  1. Multi-Supply Identification

    Multi-Supply Identification

    A single bill can carry electricity and gas on the same page. Each supply point is identified and extracted as a separate record with its own period, consumption, meter reference and charge breakdown. Never collapsed into a single flat object.

  2. Utility-Aware Field Extraction

    Utility-Aware Field Extraction

    Consumption figures, tariff periods, supply-point identifiers and regulatory charge breakdowns are fields that don't exist in a standard invoice model. Generic extractors return amount and due date. This returns the data your pipeline actually needs.

  3. Document Classification

    Document Classification

    Every document is classified before extraction: utility bill or not, and which type: electricity, gas, water or telecom. Mixed-document pipelines get a reliable gate. Anything that isn't a utility bill is flagged upfront, not silently misprocessed.

  4. Schema & Arithmetic Validation

    Schema & Arithmetic Validation

    Every record is validated against your schema rules and the bill's own arithmetic before delivery. Line items are checked against the total. Low-confidence fields are flagged, not silently passed through to your ERP.

  5. Automatic Feedback Loop

    Automatic Feedback Loop

    Operator corrections feed directly back into the model. Accuracy on new providers and layouts improves automatically, with no manual retraining and no engineering work. Most customers reach their target threshold within days of going live.

Frequently asked questions.

~/invofox / faq.json
longtail.json
1
2 ··"question" "Will it work with my regional or municipal utility provider?"
3
4 ··"answer" "Yes. The model is trained on genuine layout diversity across dozens of issuers, including small municipal providers and rural co-ops. Accuracy on a brand-new issuer starts meaningfully higher than a model trained on a curated subset, and any issuer-specific gaps close within days through the automatic feedback loop."
5
Coverage longtail.json
main 0 errors 0 warnings UTF-8 LF JSON

Want the full technical deep dive? Read the blog post