Comparison

AI/Tech Explainer

Thought Leadership

Featured

The Problems You’ll Run Into Using Azure Document Intelligence

Carmelo Juanes

Invofox CTO

1.30.2026

min read

Challenge

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Solution

Result

Disclaimer

I’m the CTO of a company that builds document parsing software, so yes — I’m biased and I definitely have a horse in this race.

That said, building in this space forced me to evaluate and stress-test basically every OCR, VLM, LLM, and “document AI” product I could get my hands on. This post isn’t a hit piece — it’s a technical rant after dealing with the same issues one too many times in production.

Azure Document Intelligence (formerly Form Recognizer) looks great on paper: managed OCR, prebuilt models, tight Azure integration. For small volumes and simple workflows, it mostly does what it says.

Things start breaking down once you push it into high-volume, latency-sensitive, production workloads.

This post focuses specifically on Azure’s Read model — the core OCR engine that everything else builds on. I’m not covering custom or invoice-specific models, just the foundational OCR layer.

If you’re evaluating OCR vendors for real production usage, here are the issues you’ll almost certainly run into.

Issue #1: No Webhooks. Polling Is Mandatory.

Azure Document Intelligence has no webhook or callback support. Every async request must be polled.

Which, apparently, is still a thing in 2026.

You submit a document, get an operation ID back, and then repeatedly ask Azure whether it’s done yet. There’s no alternative.

Here’s the minimal polling loop Azure requires:

import os
import time
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient

client = DocumentIntelligenceClient(
    endpoint=os.getenv("AZURE_DOCINTELLIGENCE_URL"),
    credential=AzureKeyCredential(os.getenv("AZURE_DOCINTELLIGENCE_KEY"))
)

with open("invoice.pdf", "rb") as f:
    document_bytes = f.read()

# Submit document - returns operation ID
poller = client.begin_analyze_document(
    "prebuilt-read",
    body=document_bytes,
    content_type="application/octet-stream"
)

# Must poll for results (no webhook callback available)
polling_attempts = 0
while not poller.done():
    polling_attempts += 1
    print(f"Poll attempt #{polling_attempts}: Checking status...")
    time.sleep(1)  # Azure recommends 1-2 second intervals

result = poller.result()

Azure recommends polling every 1–2 seconds. Poll faster and you’ll hit rate limits. Poll slower and latency goes up.

What this looks like in practice

I measured polling overhead across:

Small (25 KB, 1-page) documents
Large (1.6 MB, 100-page) PDFs
Conservative (1s) vs aggressive (100ms) polling

The results are unintuitive but consistent:

75–90% of total processing time is spent polling, not OCR
Cutting polling intervals from 1s → 100ms:
- Saves ~20–30% total time
- Increases request volume 6–7×
Rate limits make aggressive polling unusable at scale

Polling burns your GET quota while doing no useful work.

Why this matters architecturally

This isn’t something you can “optimize away”:

You waste compute waiting
You waste requests polling
You hit rate limits before CPU or throughput limits
Concurrency collapses under load

A webhook would eliminate all of this overhead. Polling isn’t a tuning problem — it’s an architectural limitation.

Issue #2: Rate Limits Kill Horizontal Scaling

Azure’s standard tier rate limits are:

POST (analyze): 15 TPS
GET (polling): 50 TPS

Because polling is mandatory, GET becomes the real bottleneck.

With recommended polling:

Each document consumes ~1 GET/sec while processing
That caps you at ~50 concurrent documents per region

Now consider a very normal batch workload:

Process 5,000 documents in 10 minutes

That requires:

~8.3 documents/sec submission
~9 seconds average processing time
~75 concurrent documents

That’s already 50% over what a single region can handle.

The workaround (spoiler: it’s ugly)

To scale, you must:

Deploy Document Intelligence in multiple regions
Often across multiple Azure subscriptions
Build a custom orchestrator that:
- Load-balances POSTs
- Tracks GET usage separately
- Handles regional failures

Azure provides none of this out of the box.

This is roughly what multi-region orchestration ends up looking like:

import time
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.core.credentials import AzureKeyCredential

# Deploy Document Intelligence resources in multiple regions
REGIONS = [
    {"endpoint": "https://eastus-docint.cognitiveservices.azure.com/", "key": "key1", "name": "East US"},
    {"endpoint": "https://eastus2-docint.cognitiveservices.azure.com/", "key": "key3", "name": "East US 2"},
    {"endpoint": "https://westus-docint.cognitiveservices.azure.com/", "key": "key2", "name": "West US"},
]

class MultiRegionOrchestrator:
    def __init__(self, regions):
        self.regions = regions
        self.current_index = 0

        # Track requests per region to respect rate limits
        self.region_stats = {
            region["name"]: {"post_count": 0, "get_count": 0, "last_reset": time.time()}
            for region in regions
        }

    def get_next_client(self):
        """Round-robin load balancing across regions"""
        region = self.regions[self.current_index]
        self.current_index = (self.current_index + 1) % len(self.regions)

        return (
            DocumentIntelligenceClient(
                endpoint=region["endpoint"],
                credential=AzureKeyCredential(region["key"])
            ),
            region["name"]
        )

    def process_document(self, document_bytes):
        """Process document with multi-region failover"""
        attempts = 0
        last_error = None

        while attempts < len(self.regions):
            try:
                client, region_name = self.get_next_client()

                # Track POST request
                self.region_stats[region_name]["post_count"] += 1

                # Submit document
                poller = client.begin_analyze_document(
                    "prebuilt-read",
                    body=document_bytes,
                    content_type="application/octet-stream"
                )

                # Poll for results (each poll is a GET request)
                while not poller.done():
                    self.region_stats[region_name]["get_count"] += 1
                    time.sleep(1)

                return poller.result()

            except Exception as e:
                last_error = e
                attempts += 1
                print(f"Region {region_name} failed, trying next region...")

        raise Exception(f"All {len(self.regions)} regions failed: {last_error}")

# Usage
orchestrator = MultiRegionOrchestrator(REGIONS)

# Process multiple documents across regions
for doc_path in document_list:
    with open(doc_path, 'rb') as f:
        doc_bytes = f.read()
    result = orchestrator.process_document(doc_bytes)

Issue #3: Silent Regional Degradation

In any multi-region deployment, there are usually one or two regions performing significantly worse than the others.

Not always the same regions — but always some.

What I’ve repeatedly seen in production:

Normal latency: ~5 seconds
Degraded region: 60+ seconds
Spikes in HTTP 500 errors, often in bursts
No corresponding signal in Azure’s error or health dashboards — error charts remain flat
A large number of HTTP 400 (InvalidRequest) responses that succeed immediately on retry with no modification to the request
- These should clearly have been reported as server-side failures (500s), not client errors
Azure status page: “All services operational”
No alerts, no notifications

These slowdowns and error spikes can last hours.

The misleading error semantics make this particularly painful to debug:

HTTP 500s surge without visibility in Azure’s monitoring
HTTP 400s imply client-side issues, but retries succeed instantly, masking what are clearly transient backend failures

Because Issue #2 forces you into multi-region deployments, your system will happily continue routing traffic into degraded regions unless you actively detect and avoid them — compounding both latency and error rates.

What you’re forced to build

If you care about latency or SLAs, you must implement:

Per-region latency tracking
Per-region error-rate tracking (independent of Azure’s dashboards)
Baselines and anomaly detection
Automatic regional failover
Alerting when a region degrades

This is monitoring and error-classification infrastructure Azure should provide — but doesn’t.

Issue #4: The 2,000-Page Hard Limit

Azure has a hard cap:

2,000 pages per document (paid tier)
Anything larger must be manually chunked

This is not an edge case.

Real-world documents that routinely exceed this:

Mortgage packages
Legal discovery
Financial disclosures

Chunking sounds simple until you do it:

Split PDFs
Track page offsets
Reassemble results
Deal with tables and paragraphs split across chunks

Here’s a simplified chunking example:

from pypdf import PdfReader, PdfWriter

AZURE_PAGE_LIMIT = 2000  # Azure's hard limit
CHUNK_SIZE = 2000  # Use maximum allowed size

# Read large document
pdf_reader = PdfReader("large_mortgage_package.pdf")
total_pages = len(pdf_reader.pages)  # e.g., 2,500 pages

# Calculate required chunks
chunks_needed = (total_pages + CHUNK_SIZE - 1) // CHUNK_SIZE  # 2 chunks for 2,500 pages

# Split into chunks
for chunk_num in range(chunks_needed):
    start_page = chunk_num * CHUNK_SIZE
    end_page = min(start_page + CHUNK_SIZE, total_pages)

    # Create chunk PDF
    pdf_writer = PdfWriter()
    for page_idx in range(start_page, end_page):
        pdf_writer.add_page(pdf_reader.pages[page_idx])

    # Save and process chunk
    chunk_path = f"chunk_{chunk_num + 1}.pdf"
    with open(chunk_path, 'wb') as f:
        pdf_writer.write(f)

    # Submit chunk to Azure
    # (Must track chunk number for result reassembly)
    process_chunk(chunk_path, start_page, end_page)

# After all chunks processed:
# 1. Adjust page numbers (chunk 2 starts at page 2001, etc.)
# 2. Merge extracted data maintaining page order
# 3. Handle data loss at chunk boundaries (split tables/paragraphs)

Want to process chunks in parallel? Congratulations — you just made the rate-limit problem worse.

Large documents amplify every other issue: polling overhead, rate limits, and orchestration complexity.

Issue #5: Unpredictable File Failures

Some files just… fail.

No clear pattern. No useful error messages.

Examples I’ve hit in production:

PDFs that open fine everywhere → “corrupted”
Images that render correctly → “invalid format”
Barcode extraction failing entire documents
Encrypted PDFs failing even after unlocking

Error messages are typically:

“An unexpected error occurred”

Which is about as actionable as it sounds.

Here’s one concrete example (a test file here):

from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import DocumentAnalysisFeature

with open("barcodes.jpg", "rb") as f:
    image_bytes = f.read()

# Test WITH barcode feature
poller = client.begin_analyze_document(
    "prebuilt-read",
    body=image_bytes,
    features=[DocumentAnalysisFeature.BARCODES],
    content_type="application/octet-stream"
)

result = poller.result()

‍‍Remove the barcode feature, and the same file succeeds.

The cost of workarounds

To survive in production you end up building:

Feature-flag retries (wasting requests)
PDF repair pipelines (Ghostscript, Poppler, ImageMagick)
Image re-encoding and metadata stripping
Fallback processing paths

All of this adds latency, cost, and complexity — and none of it is predictable.

To be fair: document processing is hard. These issues aren’t unique to Azure. But Azure gives you very little visibility into why something failed.

Smaller (But Painful) Issues

A few things that don’t break systems, but slowly drain your sanity:

Pricing opacity
Pricing is per-page and feature-based, but responses don’t include cost information. If you want accurate accounting, you have to track it yourself.
(Ask me how I accidentally left a font-styling add-on enabled during an evaluation — they later refunded it in credits.)
Studio vs API mismatches
“It works in the Studio” often does not mean it works via API. Different defaults, versions, and parameters lead to misleading POCs.
Breaking SDK changes
API and SDK upgrades regularly introduce breaking changes, forcing codebase-wide migrations and accuracy re-validation.

Who Azure Document Intelligence Is Good For

Despite all this, it can be the right tool if:

Your volumes are low
Latency isn’t critical
Documents are small and simple
You’re deeply embedded in Azure
You have strong internal platform teams

Just go in knowing what you’ll need to build around it.

Final Thoughts

This isn’t meant to bash Azure. It’s a powerful platform with serious engineering behind it.

But once you operate at scale, many of the hardest problems aren’t accuracy — they’re architecture, limits, and operational complexity.

If you’re evaluating OCR vendors, these trade-offs matter before you’re locked in.

I wish I’d seen a post like this earlier.

‍

Carmelo Juanes

Guest Contributor

Carmelo Juanes Rodríguez is the CTO and co-founder of Invofox. A former researcher at one of Spain’s leading engineering institutes, he is a full-stack developer specializing in web technologies. Since co-founding Invofox in 2022, he has led the engineering team in building a platform that serves over 100 software firms and processes tens of millions of business documents each year.

Table of Contents

Subscribe to Our Blog

Subscribe for tips and insights from Invofox — the intelligent document processing (IDP) platform that helps businesses automate invoices, receipts, and more.

Gemini Just Updated: What It Means for AI in Production

Ignacio Gabaldón

3.4.2026

Alberto Gimeno Participates as Judge at YC x Google DeepMind Multimodal AI Hackathon

Thuy Vi Nguyen

3.10.2026

The Future of Accounting: Automated or Human?

Alberto Gimeno

4.16.2024

Comparison

AI/Tech Explainer

Thought Leadership

Featured

The Problems You’ll Run Into Using Azure Document Intelligence

Carmelo Juanes

Invofox CTO

1.30.2026

min read

Disclaimer

I’m the CTO of a company that builds document parsing software, so yes — I’m biased and I definitely have a horse in this race.

Things start breaking down once you push it into high-volume, latency-sensitive, production workloads.

If you’re evaluating OCR vendors for real production usage, here are the issues you’ll almost certainly run into.

Issue #1: No Webhooks. Polling Is Mandatory.

Azure Document Intelligence has no webhook or callback support. Every async request must be polled.

Which, apparently, is still a thing in 2026.

You submit a document, get an operation ID back, and then repeatedly ask Azure whether it’s done yet. There’s no alternative.

Here’s the minimal polling loop Azure requires:

import os
import time
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient

client = DocumentIntelligenceClient(
    endpoint=os.getenv("AZURE_DOCINTELLIGENCE_URL"),
    credential=AzureKeyCredential(os.getenv("AZURE_DOCINTELLIGENCE_KEY"))
)

with open("invoice.pdf", "rb") as f:
    document_bytes = f.read()

# Submit document - returns operation ID
poller = client.begin_analyze_document(
    "prebuilt-read",
    body=document_bytes,
    content_type="application/octet-stream"
)

# Must poll for results (no webhook callback available)
polling_attempts = 0
while not poller.done():
    polling_attempts += 1
    print(f"Poll attempt #{polling_attempts}: Checking status...")
    time.sleep(1)  # Azure recommends 1-2 second intervals

result = poller.result()

Azure recommends polling every 1–2 seconds. Poll faster and you’ll hit rate limits. Poll slower and latency goes up.

What this looks like in practice

I measured polling overhead across:

Small (25 KB, 1-page) documents
Large (1.6 MB, 100-page) PDFs
Conservative (1s) vs aggressive (100ms) polling

The results are unintuitive but consistent:

75–90% of total processing time is spent polling, not OCR
Cutting polling intervals from 1s → 100ms:
- Saves ~20–30% total time
- Increases request volume 6–7×
Rate limits make aggressive polling unusable at scale

Polling burns your GET quota while doing no useful work.

Why this matters architecturally

This isn’t something you can “optimize away”:

You waste compute waiting
You waste requests polling
You hit rate limits before CPU or throughput limits
Concurrency collapses under load

A webhook would eliminate all of this overhead. Polling isn’t a tuning problem — it’s an architectural limitation.

Issue #2: Rate Limits Kill Horizontal Scaling

Azure’s standard tier rate limits are:

POST (analyze): 15 TPS
GET (polling): 50 TPS

Because polling is mandatory, GET becomes the real bottleneck.

With recommended polling:

Each document consumes ~1 GET/sec while processing
That caps you at ~50 concurrent documents per region

Now consider a very normal batch workload:

Process 5,000 documents in 10 minutes

That requires:

~8.3 documents/sec submission
~9 seconds average processing time
~75 concurrent documents

That’s already 50% over what a single region can handle.

The workaround (spoiler: it’s ugly)

To scale, you must:

Deploy Document Intelligence in multiple regions
Often across multiple Azure subscriptions
Build a custom orchestrator that:
- Load-balances POSTs
- Tracks GET usage separately
- Handles regional failures

Azure provides none of this out of the box.

This is roughly what multi-region orchestration ends up looking like:

import time
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.core.credentials import AzureKeyCredential

# Deploy Document Intelligence resources in multiple regions
REGIONS = [
    {"endpoint": "https://eastus-docint.cognitiveservices.azure.com/", "key": "key1", "name": "East US"},
    {"endpoint": "https://eastus2-docint.cognitiveservices.azure.com/", "key": "key3", "name": "East US 2"},
    {"endpoint": "https://westus-docint.cognitiveservices.azure.com/", "key": "key2", "name": "West US"},
]

class MultiRegionOrchestrator:
    def __init__(self, regions):
        self.regions = regions
        self.current_index = 0

        # Track requests per region to respect rate limits
        self.region_stats = {
            region["name"]: {"post_count": 0, "get_count": 0, "last_reset": time.time()}
            for region in regions
        }

    def get_next_client(self):
        """Round-robin load balancing across regions"""
        region = self.regions[self.current_index]
        self.current_index = (self.current_index + 1) % len(self.regions)

        return (
            DocumentIntelligenceClient(
                endpoint=region["endpoint"],
                credential=AzureKeyCredential(region["key"])
            ),
            region["name"]
        )

    def process_document(self, document_bytes):
        """Process document with multi-region failover"""
        attempts = 0
        last_error = None

        while attempts < len(self.regions):
            try:
                client, region_name = self.get_next_client()

                # Track POST request
                self.region_stats[region_name]["post_count"] += 1

                # Submit document
                poller = client.begin_analyze_document(
                    "prebuilt-read",
                    body=document_bytes,
                    content_type="application/octet-stream"
                )

                # Poll for results (each poll is a GET request)
                while not poller.done():
                    self.region_stats[region_name]["get_count"] += 1
                    time.sleep(1)

                return poller.result()

            except Exception as e:
                last_error = e
                attempts += 1
                print(f"Region {region_name} failed, trying next region...")

        raise Exception(f"All {len(self.regions)} regions failed: {last_error}")

# Usage
orchestrator = MultiRegionOrchestrator(REGIONS)

# Process multiple documents across regions
for doc_path in document_list:
    with open(doc_path, 'rb') as f:
        doc_bytes = f.read()
    result = orchestrator.process_document(doc_bytes)

Issue #3: Silent Regional Degradation

In any multi-region deployment, there are usually one or two regions performing significantly worse than the others.

Not always the same regions — but always some.

What I’ve repeatedly seen in production:

Normal latency: ~5 seconds
Degraded region: 60+ seconds
Spikes in HTTP 500 errors, often in bursts
No corresponding signal in Azure’s error or health dashboards — error charts remain flat
A large number of HTTP 400 (InvalidRequest) responses that succeed immediately on retry with no modification to the request
- These should clearly have been reported as server-side failures (500s), not client errors
Azure status page: “All services operational”
No alerts, no notifications

These slowdowns and error spikes can last hours.

The misleading error semantics make this particularly painful to debug:

HTTP 500s surge without visibility in Azure’s monitoring
HTTP 400s imply client-side issues, but retries succeed instantly, masking what are clearly transient backend failures

What you’re forced to build

If you care about latency or SLAs, you must implement:

Per-region latency tracking
Per-region error-rate tracking (independent of Azure’s dashboards)
Baselines and anomaly detection
Automatic regional failover
Alerting when a region degrades

This is monitoring and error-classification infrastructure Azure should provide — but doesn’t.

Issue #4: The 2,000-Page Hard Limit

Azure has a hard cap:

2,000 pages per document (paid tier)
Anything larger must be manually chunked

This is not an edge case.

Real-world documents that routinely exceed this:

Mortgage packages
Legal discovery
Financial disclosures

Chunking sounds simple until you do it:

Split PDFs
Track page offsets
Reassemble results
Deal with tables and paragraphs split across chunks

Here’s a simplified chunking example:

from pypdf import PdfReader, PdfWriter

AZURE_PAGE_LIMIT = 2000  # Azure's hard limit
CHUNK_SIZE = 2000  # Use maximum allowed size

# Read large document
pdf_reader = PdfReader("large_mortgage_package.pdf")
total_pages = len(pdf_reader.pages)  # e.g., 2,500 pages

# Calculate required chunks
chunks_needed = (total_pages + CHUNK_SIZE - 1) // CHUNK_SIZE  # 2 chunks for 2,500 pages

# Split into chunks
for chunk_num in range(chunks_needed):
    start_page = chunk_num * CHUNK_SIZE
    end_page = min(start_page + CHUNK_SIZE, total_pages)

    # Create chunk PDF
    pdf_writer = PdfWriter()
    for page_idx in range(start_page, end_page):
        pdf_writer.add_page(pdf_reader.pages[page_idx])

    # Save and process chunk
    chunk_path = f"chunk_{chunk_num + 1}.pdf"
    with open(chunk_path, 'wb') as f:
        pdf_writer.write(f)

    # Submit chunk to Azure
    # (Must track chunk number for result reassembly)
    process_chunk(chunk_path, start_page, end_page)

# After all chunks processed:
# 1. Adjust page numbers (chunk 2 starts at page 2001, etc.)
# 2. Merge extracted data maintaining page order
# 3. Handle data loss at chunk boundaries (split tables/paragraphs)

Want to process chunks in parallel? Congratulations — you just made the rate-limit problem worse.

Large documents amplify every other issue: polling overhead, rate limits, and orchestration complexity.

Issue #5: Unpredictable File Failures

Some files just… fail.

No clear pattern. No useful error messages.

Examples I’ve hit in production:

PDFs that open fine everywhere → “corrupted”
Images that render correctly → “invalid format”
Barcode extraction failing entire documents
Encrypted PDFs failing even after unlocking

Error messages are typically:

“An unexpected error occurred”

Which is about as actionable as it sounds.

Here’s one concrete example (a test file here):

from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import DocumentAnalysisFeature

with open("barcodes.jpg", "rb") as f:
    image_bytes = f.read()

# Test WITH barcode feature
poller = client.begin_analyze_document(
    "prebuilt-read",
    body=image_bytes,
    features=[DocumentAnalysisFeature.BARCODES],
    content_type="application/octet-stream"
)

result = poller.result()

‍‍Remove the barcode feature, and the same file succeeds.

The cost of workarounds

To survive in production you end up building:

Feature-flag retries (wasting requests)
PDF repair pipelines (Ghostscript, Poppler, ImageMagick)
Image re-encoding and metadata stripping
Fallback processing paths

All of this adds latency, cost, and complexity — and none of it is predictable.

To be fair: document processing is hard. These issues aren’t unique to Azure. But Azure gives you very little visibility into why something failed.

Smaller (But Painful) Issues

A few things that don’t break systems, but slowly drain your sanity:

Pricing opacity
Pricing is per-page and feature-based, but responses don’t include cost information. If you want accurate accounting, you have to track it yourself.
(Ask me how I accidentally left a font-styling add-on enabled during an evaluation — they later refunded it in credits.)
Studio vs API mismatches
“It works in the Studio” often does not mean it works via API. Different defaults, versions, and parameters lead to misleading POCs.
Breaking SDK changes
API and SDK upgrades regularly introduce breaking changes, forcing codebase-wide migrations and accuracy re-validation.

Who Azure Document Intelligence Is Good For

Despite all this, it can be the right tool if:

Your volumes are low
Latency isn’t critical
Documents are small and simple
You’re deeply embedded in Azure
You have strong internal platform teams

Just go in knowing what you’ll need to build around it.

Final Thoughts

This isn’t meant to bash Azure. It’s a powerful platform with serious engineering behind it.

But once you operate at scale, many of the hardest problems aren’t accuracy — they’re architecture, limits, and operational complexity.

If you’re evaluating OCR vendors, these trade-offs matter before you’re locked in.

I wish I’d seen a post like this earlier.

‍

Carmelo Juanes

Guest Contributor

Subscribe to Our Blog

Subscribe for tips and insights from Invofox — the intelligent document processing (IDP) platform that helps businesses automate invoices, receipts, and more.