
1.30.2026
min read

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
I’m the CTO of a company that builds document parsing software, so yes — I’m biased and I definitely have a horse in this race.
That said, building in this space forced me to evaluate and stress-test basically every OCR, VLM, LLM, and “document AI” product I could get my hands on. This post isn’t a hit piece — it’s a technical rant after dealing with the same issues one too many times in production.
Azure Document Intelligence (formerly Form Recognizer) looks great on paper: managed OCR, prebuilt models, tight Azure integration. For small volumes and simple workflows, it mostly does what it says.
Things start breaking down once you push it into high-volume, latency-sensitive, production workloads.
This post focuses specifically on Azure’s Read model — the core OCR engine that everything else builds on. I’m not covering custom or invoice-specific models, just the foundational OCR layer.
If you’re evaluating OCR vendors for real production usage, here are the issues you’ll almost certainly run into.
Azure Document Intelligence has no webhook or callback support. Every async request must be polled.
Which, apparently, is still a thing in 2026.
You submit a document, get an operation ID back, and then repeatedly ask Azure whether it’s done yet. There’s no alternative.
Here’s the minimal polling loop Azure requires:
import os
import time
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
client = DocumentIntelligenceClient(
endpoint=os.getenv("AZURE_DOCINTELLIGENCE_URL"),
credential=AzureKeyCredential(os.getenv("AZURE_DOCINTELLIGENCE_KEY"))
)
with open("invoice.pdf", "rb") as f:
document_bytes = f.read()
# Submit document - returns operation ID
poller = client.begin_analyze_document(
"prebuilt-read",
body=document_bytes,
content_type="application/octet-stream"
)
# Must poll for results (no webhook callback available)
polling_attempts = 0
while not poller.done():
polling_attempts += 1
print(f"Poll attempt #{polling_attempts}: Checking status...")
time.sleep(1) # Azure recommends 1-2 second intervals
result = poller.result()Azure recommends polling every 1–2 seconds. Poll faster and you’ll hit rate limits. Poll slower and latency goes up.
I measured polling overhead across:
The results are unintuitive but consistent:
Polling burns your GET quota while doing no useful work.
This isn’t something you can “optimize away”:
A webhook would eliminate all of this overhead. Polling isn’t a tuning problem — it’s an architectural limitation.
Azure’s standard tier rate limits are:
Because polling is mandatory, GET becomes the real bottleneck.
With recommended polling:
Now consider a very normal batch workload:
Process 5,000 documents in 10 minutes
That requires:
That’s already 50% over what a single region can handle.
To scale, you must:
Azure provides none of this out of the box.
This is roughly what multi-region orchestration ends up looking like:
import time
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.core.credentials import AzureKeyCredential
# Deploy Document Intelligence resources in multiple regions
REGIONS = [
{"endpoint": "https://eastus-docint.cognitiveservices.azure.com/", "key": "key1", "name": "East US"},
{"endpoint": "https://eastus2-docint.cognitiveservices.azure.com/", "key": "key3", "name": "East US 2"},
{"endpoint": "https://westus-docint.cognitiveservices.azure.com/", "key": "key2", "name": "West US"},
]
class MultiRegionOrchestrator:
def __init__(self, regions):
self.regions = regions
self.current_index = 0
# Track requests per region to respect rate limits
self.region_stats = {
region["name"]: {"post_count": 0, "get_count": 0, "last_reset": time.time()}
for region in regions
}
def get_next_client(self):
"""Round-robin load balancing across regions"""
region = self.regions[self.current_index]
self.current_index = (self.current_index + 1) % len(self.regions)
return (
DocumentIntelligenceClient(
endpoint=region["endpoint"],
credential=AzureKeyCredential(region["key"])
),
region["name"]
)
def process_document(self, document_bytes):
"""Process document with multi-region failover"""
attempts = 0
last_error = None
while attempts < len(self.regions):
try:
client, region_name = self.get_next_client()
# Track POST request
self.region_stats[region_name]["post_count"] += 1
# Submit document
poller = client.begin_analyze_document(
"prebuilt-read",
body=document_bytes,
content_type="application/octet-stream"
)
# Poll for results (each poll is a GET request)
while not poller.done():
self.region_stats[region_name]["get_count"] += 1
time.sleep(1)
return poller.result()
except Exception as e:
last_error = e
attempts += 1
print(f"Region {region_name} failed, trying next region...")
raise Exception(f"All {len(self.regions)} regions failed: {last_error}")
# Usage
orchestrator = MultiRegionOrchestrator(REGIONS)
# Process multiple documents across regions
for doc_path in document_list:
with open(doc_path, 'rb') as f:
doc_bytes = f.read()
result = orchestrator.process_document(doc_bytes)In any multi-region deployment, there are usually one or two regions performing significantly worse than the others.
Not always the same regions — but always some.
What I’ve repeatedly seen in production:
These slowdowns and error spikes can last hours.
The misleading error semantics make this particularly painful to debug:
Because Issue #2 forces you into multi-region deployments, your system will happily continue routing traffic into degraded regions unless you actively detect and avoid them — compounding both latency and error rates.
If you care about latency or SLAs, you must implement:
This is monitoring and error-classification infrastructure Azure should provide — but doesn’t.
Azure has a hard cap:
This is not an edge case.
Real-world documents that routinely exceed this:
Chunking sounds simple until you do it:
Here’s a simplified chunking example:
from pypdf import PdfReader, PdfWriter
AZURE_PAGE_LIMIT = 2000 # Azure's hard limit
CHUNK_SIZE = 2000 # Use maximum allowed size
# Read large document
pdf_reader = PdfReader("large_mortgage_package.pdf")
total_pages = len(pdf_reader.pages) # e.g., 2,500 pages
# Calculate required chunks
chunks_needed = (total_pages + CHUNK_SIZE - 1) // CHUNK_SIZE # 2 chunks for 2,500 pages
# Split into chunks
for chunk_num in range(chunks_needed):
start_page = chunk_num * CHUNK_SIZE
end_page = min(start_page + CHUNK_SIZE, total_pages)
# Create chunk PDF
pdf_writer = PdfWriter()
for page_idx in range(start_page, end_page):
pdf_writer.add_page(pdf_reader.pages[page_idx])
# Save and process chunk
chunk_path = f"chunk_{chunk_num + 1}.pdf"
with open(chunk_path, 'wb') as f:
pdf_writer.write(f)
# Submit chunk to Azure
# (Must track chunk number for result reassembly)
process_chunk(chunk_path, start_page, end_page)
# After all chunks processed:
# 1. Adjust page numbers (chunk 2 starts at page 2001, etc.)
# 2. Merge extracted data maintaining page order
# 3. Handle data loss at chunk boundaries (split tables/paragraphs)Want to process chunks in parallel? Congratulations — you just made the rate-limit problem worse.
Large documents amplify every other issue: polling overhead, rate limits, and orchestration complexity.
Some files just… fail.
No clear pattern. No useful error messages.
Examples I’ve hit in production:
Error messages are typically:
“An unexpected error occurred”
Which is about as actionable as it sounds.
Here’s one concrete example (a test file here):
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import DocumentAnalysisFeature
with open("barcodes.jpg", "rb") as f:
image_bytes = f.read()
# Test WITH barcode feature
poller = client.begin_analyze_document(
"prebuilt-read",
body=image_bytes,
features=[DocumentAnalysisFeature.BARCODES],
content_type="application/octet-stream"
)
result = poller.result()Remove the barcode feature, and the same file succeeds.
To survive in production you end up building:
All of this adds latency, cost, and complexity — and none of it is predictable.
To be fair: document processing is hard. These issues aren’t unique to Azure. But Azure gives you very little visibility into why something failed.
A few things that don’t break systems, but slowly drain your sanity:
Despite all this, it can be the right tool if:
Just go in knowing what you’ll need to build around it.
This isn’t meant to bash Azure. It’s a powerful platform with serious engineering behind it.
But once you operate at scale, many of the hardest problems aren’t accuracy — they’re architecture, limits, and operational complexity.
If you’re evaluating OCR vendors, these trade-offs matter before you’re locked in.
I wish I’d seen a post like this earlier.

Carmelo Juanes Rodríguez is the CTO and co-founder of Invofox. A former researcher at one of Spain’s leading engineering institutes, he is a full-stack developer specializing in web technologies. Since co-founding Invofox in 2022, he has led the engineering team in building a platform that serves over 100 software firms and processes tens of millions of business documents each year.
Subscribe for tips and insights from Invofox — the intelligent document processing (IDP) platform that helps businesses automate invoices, receipts, and more.



1.30.2026
min read
I’m the CTO of a company that builds document parsing software, so yes — I’m biased and I definitely have a horse in this race.
That said, building in this space forced me to evaluate and stress-test basically every OCR, VLM, LLM, and “document AI” product I could get my hands on. This post isn’t a hit piece — it’s a technical rant after dealing with the same issues one too many times in production.
Azure Document Intelligence (formerly Form Recognizer) looks great on paper: managed OCR, prebuilt models, tight Azure integration. For small volumes and simple workflows, it mostly does what it says.
Things start breaking down once you push it into high-volume, latency-sensitive, production workloads.
This post focuses specifically on Azure’s Read model — the core OCR engine that everything else builds on. I’m not covering custom or invoice-specific models, just the foundational OCR layer.
If you’re evaluating OCR vendors for real production usage, here are the issues you’ll almost certainly run into.
Azure Document Intelligence has no webhook or callback support. Every async request must be polled.
Which, apparently, is still a thing in 2026.
You submit a document, get an operation ID back, and then repeatedly ask Azure whether it’s done yet. There’s no alternative.
Here’s the minimal polling loop Azure requires:
import os
import time
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
client = DocumentIntelligenceClient(
endpoint=os.getenv("AZURE_DOCINTELLIGENCE_URL"),
credential=AzureKeyCredential(os.getenv("AZURE_DOCINTELLIGENCE_KEY"))
)
with open("invoice.pdf", "rb") as f:
document_bytes = f.read()
# Submit document - returns operation ID
poller = client.begin_analyze_document(
"prebuilt-read",
body=document_bytes,
content_type="application/octet-stream"
)
# Must poll for results (no webhook callback available)
polling_attempts = 0
while not poller.done():
polling_attempts += 1
print(f"Poll attempt #{polling_attempts}: Checking status...")
time.sleep(1) # Azure recommends 1-2 second intervals
result = poller.result()Azure recommends polling every 1–2 seconds. Poll faster and you’ll hit rate limits. Poll slower and latency goes up.
I measured polling overhead across:
The results are unintuitive but consistent:
Polling burns your GET quota while doing no useful work.
This isn’t something you can “optimize away”:
A webhook would eliminate all of this overhead. Polling isn’t a tuning problem — it’s an architectural limitation.
Azure’s standard tier rate limits are:
Because polling is mandatory, GET becomes the real bottleneck.
With recommended polling:
Now consider a very normal batch workload:
Process 5,000 documents in 10 minutes
That requires:
That’s already 50% over what a single region can handle.
To scale, you must:
Azure provides none of this out of the box.
This is roughly what multi-region orchestration ends up looking like:
import time
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.core.credentials import AzureKeyCredential
# Deploy Document Intelligence resources in multiple regions
REGIONS = [
{"endpoint": "https://eastus-docint.cognitiveservices.azure.com/", "key": "key1", "name": "East US"},
{"endpoint": "https://eastus2-docint.cognitiveservices.azure.com/", "key": "key3", "name": "East US 2"},
{"endpoint": "https://westus-docint.cognitiveservices.azure.com/", "key": "key2", "name": "West US"},
]
class MultiRegionOrchestrator:
def __init__(self, regions):
self.regions = regions
self.current_index = 0
# Track requests per region to respect rate limits
self.region_stats = {
region["name"]: {"post_count": 0, "get_count": 0, "last_reset": time.time()}
for region in regions
}
def get_next_client(self):
"""Round-robin load balancing across regions"""
region = self.regions[self.current_index]
self.current_index = (self.current_index + 1) % len(self.regions)
return (
DocumentIntelligenceClient(
endpoint=region["endpoint"],
credential=AzureKeyCredential(region["key"])
),
region["name"]
)
def process_document(self, document_bytes):
"""Process document with multi-region failover"""
attempts = 0
last_error = None
while attempts < len(self.regions):
try:
client, region_name = self.get_next_client()
# Track POST request
self.region_stats[region_name]["post_count"] += 1
# Submit document
poller = client.begin_analyze_document(
"prebuilt-read",
body=document_bytes,
content_type="application/octet-stream"
)
# Poll for results (each poll is a GET request)
while not poller.done():
self.region_stats[region_name]["get_count"] += 1
time.sleep(1)
return poller.result()
except Exception as e:
last_error = e
attempts += 1
print(f"Region {region_name} failed, trying next region...")
raise Exception(f"All {len(self.regions)} regions failed: {last_error}")
# Usage
orchestrator = MultiRegionOrchestrator(REGIONS)
# Process multiple documents across regions
for doc_path in document_list:
with open(doc_path, 'rb') as f:
doc_bytes = f.read()
result = orchestrator.process_document(doc_bytes)In any multi-region deployment, there are usually one or two regions performing significantly worse than the others.
Not always the same regions — but always some.
What I’ve repeatedly seen in production:
These slowdowns and error spikes can last hours.
The misleading error semantics make this particularly painful to debug:
Because Issue #2 forces you into multi-region deployments, your system will happily continue routing traffic into degraded regions unless you actively detect and avoid them — compounding both latency and error rates.
If you care about latency or SLAs, you must implement:
This is monitoring and error-classification infrastructure Azure should provide — but doesn’t.
Azure has a hard cap:
This is not an edge case.
Real-world documents that routinely exceed this:
Chunking sounds simple until you do it:
Here’s a simplified chunking example:
from pypdf import PdfReader, PdfWriter
AZURE_PAGE_LIMIT = 2000 # Azure's hard limit
CHUNK_SIZE = 2000 # Use maximum allowed size
# Read large document
pdf_reader = PdfReader("large_mortgage_package.pdf")
total_pages = len(pdf_reader.pages) # e.g., 2,500 pages
# Calculate required chunks
chunks_needed = (total_pages + CHUNK_SIZE - 1) // CHUNK_SIZE # 2 chunks for 2,500 pages
# Split into chunks
for chunk_num in range(chunks_needed):
start_page = chunk_num * CHUNK_SIZE
end_page = min(start_page + CHUNK_SIZE, total_pages)
# Create chunk PDF
pdf_writer = PdfWriter()
for page_idx in range(start_page, end_page):
pdf_writer.add_page(pdf_reader.pages[page_idx])
# Save and process chunk
chunk_path = f"chunk_{chunk_num + 1}.pdf"
with open(chunk_path, 'wb') as f:
pdf_writer.write(f)
# Submit chunk to Azure
# (Must track chunk number for result reassembly)
process_chunk(chunk_path, start_page, end_page)
# After all chunks processed:
# 1. Adjust page numbers (chunk 2 starts at page 2001, etc.)
# 2. Merge extracted data maintaining page order
# 3. Handle data loss at chunk boundaries (split tables/paragraphs)Want to process chunks in parallel? Congratulations — you just made the rate-limit problem worse.
Large documents amplify every other issue: polling overhead, rate limits, and orchestration complexity.
Some files just… fail.
No clear pattern. No useful error messages.
Examples I’ve hit in production:
Error messages are typically:
“An unexpected error occurred”
Which is about as actionable as it sounds.
Here’s one concrete example (a test file here):
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import DocumentAnalysisFeature
with open("barcodes.jpg", "rb") as f:
image_bytes = f.read()
# Test WITH barcode feature
poller = client.begin_analyze_document(
"prebuilt-read",
body=image_bytes,
features=[DocumentAnalysisFeature.BARCODES],
content_type="application/octet-stream"
)
result = poller.result()Remove the barcode feature, and the same file succeeds.
To survive in production you end up building:
All of this adds latency, cost, and complexity — and none of it is predictable.
To be fair: document processing is hard. These issues aren’t unique to Azure. But Azure gives you very little visibility into why something failed.
A few things that don’t break systems, but slowly drain your sanity:
Despite all this, it can be the right tool if:
Just go in knowing what you’ll need to build around it.
This isn’t meant to bash Azure. It’s a powerful platform with serious engineering behind it.
But once you operate at scale, many of the hardest problems aren’t accuracy — they’re architecture, limits, and operational complexity.
If you’re evaluating OCR vendors, these trade-offs matter before you’re locked in.
I wish I’d seen a post like this earlier.

Carmelo Juanes Rodríguez is the CTO and co-founder of Invofox. A former researcher at one of Spain’s leading engineering institutes, he is a full-stack developer specializing in web technologies. Since co-founding Invofox in 2022, he has led the engineering team in building a platform that serves over 100 software firms and processes tens of millions of business documents each year.
Subscribe for tips and insights from Invofox — the intelligent document processing (IDP) platform that helps businesses automate invoices, receipts, and more.

Used by 150+ companies. We’ll onboard you in 24h.