Table of contents
Disclaimer
I’m the CTO of a company that builds document parsing software, so yes — I’m biased and I definitely have a horse in this race.
That said, building in this space forced me to evaluate and stress-test basically every OCR, VLM, LLM, and “document AI” product I could get my hands on. This post isn’t a hit piece — it’s a technical rant after dealing with the same issues one too many time in production.
Azure Document Intelligence (formerly Form Recognizer) looks great on paper: managed OCR, prebuilt models, tight Azure integration. For small volumes and simple workflows, it mostly does what it says.
Things start breaking down once you push it into high-volume, latency-sensitive, production workloads.
This post focuses specifically on Azure’s Read model — the core OCR engine that everything else builds on. I’m not covering custom or invoice-specific models, just the foundational OCR layer.
If you’re evaluating OCR vendors for real production usage, here are the issues you’ll almost certainly run into.
Issue #1: No webhooks. Polling is mandatory.
Azure Document Intelligence has no webhook or callback support. Every async request must be polled.
Which, apparently, is still a thing in 2026.
You submit a document, get an operation ID back, and then repeatedly ask Azure whether it’s done yet. There’s no alternative.
Azure recommends polling every 1–2 seconds. Poll faster and you’ll hit rate limits. Poll slower and latency goes up.
What this looks like in practice
I measured polling overhead across:
- Small (25 KB, 1-page) documents
- Large (1.6 MB, 100-page) PDFs
- Conservative (1s) vs aggressive (100ms) polling
The results are unintuitive but consistent:
- 75–90% of total processing time is spent polling, not OCR
- Cutting polling intervals from 1s → 100ms:
- Saves ~20–30% total time
- Increases request volume 6–7×
- Rate limits make aggressive polling unusable at scale
Polling burns your GET quota while doing no useful work.
Why this matters architecturally
This isn’t something you can “optimize away”:
- You waste compute waiting
- You waste requests polling
- You hit rate limits before CPU or throughput limits
- Concurrency collapses under load
A webhook would eliminate all of this overhead. Polling isn’t a tuning problem — it’s an architectural limitation.
Issue #2: Rate limits kill horizontal scaling
Azure’s standard tier rate limits are:
- POST (analyze): 15 TPS
- GET (polling): 50 TPS
Because polling is mandatory, GET becomes the real bottleneck.
With recommended polling:
- Each document consumes ~1 GET/sec while processing
- That caps you at ~50 concurrent documents per region
Now consider a very normal batch workload:
Process 5,000 documents in 10 minutes
That requires:
- ~8.3 documents/sec submission
- ~9 seconds average processing time
- ~75 concurrent documents
That’s already 50% over what a single region can handle.
The workaround (spoiler: it’s ugly)
To scale, you must:
- Deploy Document Intelligence in multiple regions
- Often across multiple Azure subscriptions
- Build a custom orchestrator that:
- Load-balances POSTs
- Tracks GET usage separately
- Handles regional failures
Azure provides none of this out of the box.
Issue #3: Silent regional degradation
In any multi-region deployment, there are usually one or two regions performing significantly worse than the others.
Not always the same regions — but always some.
What I’ve repeatedly seen in production:
- Normal latency: ~5 seconds
- Degraded region: 60+ seconds
- Spikes in HTTP 500 errors, often in bursts
- No corresponding signal in Azure’s error or health dashboards — error charts remain flat
- A large number of HTTP 400 (InvalidRequest) responses that succeed immediately on retry with no modification to the request
- These should clearly have been reported as server-side failures (500s), not client errors
- Azure status page: “All services operational”
- No alerts, no notifications
These slowdowns and error spikes can last hours.
The misleading error semantics make this particularly painful to debug:
- HTTP 500s surge without visibility in Azure’s monitoring
- HTTP 400s imply client-side issues, but retries succeed instantly, masking what are clearly transient backend failures
Because Issue #2 forces you into multi-region deployments, your system will happily continue routing traffic into degraded regions unless you actively detect and avoid them — compounding both latency and error rates.
What you’re forced to build
If you care about latency or SLAs, you must implement:
- Per-region latency tracking
- Per-region error-rate tracking (independent of Azure’s dashboards)
- Baselines and anomaly detection
- Automatic regional failover
- Alerting when a region degrades
This is monitoring and error-classification infrastructure Azure should provide — but doesn’t.
Issue #4: The 2,000-page hard limit
Azure has a hard cap:
- 2,000 pages per document (paid tier)
- Anything larger must be manually chunked
This is not an edge case.
Real-world documents that routinely exceed this:
- Mortgage packages
- Legal discovery
- Financial disclosures
Chunking sounds simple until you do it:
- Split PDFs
- Track page offsets
- Reassemble results
- Deal with tables and paragraphs split across chunks
Want to process chunks in parallel? Congratulations — you just made the rate-limit problem worse.
Large documents amplify every other issue: polling overhead, rate limits, and orchestration complexity.
Issue #5: Unpredictable file failures
Some files just… fail.
No clear pattern. No useful error messages.
Examples I’ve hit in production:
- PDFs that open fine everywhere → “corrupted”
- Images that render correctly → “invalid format”
- Barcode extraction failing entire documents
- Encrypted PDFs failing even after unlocking
Error messages are typically:
“An unexpected error occurred”
Which is about as actionable as it sounds.
The cost of workarounds
To survive in production you end up building:
- Feature-flag retries (wasting requests)
- PDF repair pipelines (Ghostscript, Poppler, ImageMagick)
- Image re-encoding and metadata stripping
- Fallback processing paths
All of this adds latency, cost, and complexity — and none of it is predictable.
To be fair: document processing is hard. These issues aren’t unique to Azure. But Azure gives you very little visibility into why something failed.
Smaller (but painful) issues
A few things that don’t break systems, but slowly drain your sanity:
- Pricing opacity: Pricing is per-page and feature-based, but responses don’t include cost information. If you want accurate accounting, you have to track it yourself.
- Studio vs API mismatches: “It works in the Studio” often does not mean it works via API. Different defaults, versions, and parameters lead to misleading POCs.
- Breaking SDK changes: API and SDK upgrades regularly introduce breaking changes, forcing codebase-wide migrations and accuracy re-validation.
Who Azure Document Intelligence is good for
Despite all this, it can be the right tool if:
- Your volumes are low
- Latency isn’t critical
- Documents are small and simple
- You’re deeply embedded in Azure
- You have strong internal platform teams
Just go in knowing what you’ll need to build around it.
Final thoughts
This isn’t meant to bash Azure. It’s a powerful platform with serious engineering behind it.
But once you operate at scale, many of the hardest problems aren’t accuracy — they’re architecture, limits, and operational complexity.
If you’re evaluating OCR vendors, these trade-offs matter before you’re locked in.
I wish I’d seen a post like this earlier.