Using GPT-4o (ChatGPT) API
OpenAI’s GPT-4o can extract structured information when prompted correctly, but unlike Invofox, it cannot directly read PDF files. Text must first be extracted using OCR tools like Tesseract or pdfplumber, then sent to GPT via API prompt.
You will need an OpenAI API key. Create a .env file with this convention:
OPENAI_API_KEY=your_api_key
Step 1: Set up your Python environment
Create a virtual environment to isolate dependencies:
# macOS / Linux:
python3 -m venv env
source env/bin/activate
# Windows:
python -m venv env
.\env\Scripts\activate
Step 2: Install required packages
Install the necessary libraries:
pip install pdfplumber openai python-dotenv
After installing, generate a requirements.txt file:
pip freeze > requirements.txt
Add a .gitignore to avoid committing the virtual environment.
Step 3: Extract text and parse with GPT-4o
Here’s the complete code in openai-main.py:
import pdfplumber
import openai
from dotenv import load_dotenv
import os
load_dotenv()
client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def extract_text_from_pdf(pdf_path):
text = ""
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
page_text = page.extract_text()
if page_text:
text += page_text + "\n"
return text
def parse_invoice_with_openai(invoice_text):
prompt = (
"Extract the following fields from this invoice text and return as a JSON object:\n"
"- Invoice Number\n"
"- Invoice Date\n"
"- Due Date\n"
"- Invoice Status (e.g. unpaid/paid)\n"
"- Sender Name and Email\n"
"- Recipient Name and Email\n"
"- Items (description, quantity, rate)\n"
"- Total Amount\n"
"- Memo\n\n"
"Invoice Text:\n"
f"{invoice_text}"
)
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt},
],
max_tokens=500,
temperature=0,
)
return response.choices[0].message.content
if __name__ == "__main__":
pdf_path = "invoice_sample.pdf"
invoice_text = extract_text_from_pdf(pdf_path)
parsed_data = parse_invoice_with_openai(invoice_text)
print(parsed_data)
The extract_text_from_pdf function uses pdfplumber to read each page and concatenate text. The parse_invoice_with_openai function sends the text to GPT-4o asking for JSON output.
Step 4: Output
Running python openai-main.py produces JSON output with extracted fields:
{
"Invoice Number": "2-7-25",
"Invoice Date": "July 2, 2025",
"Due Date": "Upon receipt",
"Invoice Status": "UNPAID",
"Sender Name and Email": {
"Name": "Anmol Baranwal",
"Email": "hi@anmolbaranwal.com"
},
"Recipient Name and Email": {
"Name": "Anmol Baranwal",
"Email": "anmolbaranwal09@gmail.com"
},
"Line Items": [
{ "Description": "Testing", "Quantity": 1, "Rate": "$50.00", "Total": "$50.00" },
{ "Description": "Development", "Quantity": 1, "Rate": "$100.00", "Total": "$100.00" },
{ "Description": "Blog", "Quantity": 1, "Rate": "$50.00", "Total": "$50.00" }
],
"Subtotal": "$200.00",
"Total Amount": "$200.00",
"Memo or Notes": "Thank you! This is a sample invoice for testing document parsing with AI models."
}
Pros:
- Easy to try and flexible
- GPT-4 excels at logic and structured data extraction
- Can correctly identify invoice fields and calculate totals
Cons:
- Requires prompt engineering and output verification
- JSON can be malformed or miss fields (hallucinations possible)
- No built-in validation or confidence scores
- Outputs vary by prompt style
- Sending all text in prompts can be costly for large documents
Cost estimate for 1–2 page invoice extraction: $0.005–$0.018, depending on prompt detail. Response time: 1–30 seconds, subject to load spikes.
Using Claude 3.5 Sonnet API
Anthropic’s Claude 3.5 Sonnet can parse structured data from text when prompted correctly. Like GPT-4o, it cannot read PDF files directly via API, so text must be extracted first.
You will need an Anthropic API key:
ANTHROPIC_API_KEY=your_api_key
Step 1: Set up environment and install packages
Create a virtual environment:
# macOS / Linux:
python3 -m venv env
source env/bin/activate
# Windows:
python -m venv env
.\env\Scripts\activate
Install required libraries:
pip install pdfplumber anthropic python-dotenv
Then export to requirements.txt and add .gitignore.
Step 2: Extract text and parse with Claude 3.5 Sonnet
Create anthropic-main.py:
import pdfplumber
import anthropic
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("ANTHROPIC_API_KEY")
client = anthropic.Anthropic(api_key=api_key)
print("API Key loaded:", api_key[:12], "...")
def extract_text_from_pdf(pdf_path):
text = ""
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
text += page.extract_text() + "\n"
return text
def parse_invoice_with_claude(invoice_text):
prompt = (
"Extract the following fields from this invoice text and return as a JSON object:\n"
"- Invoice Number\n"
"- Invoice Date\n"
"- Due Date\n"
"- Invoice Status (e.g. unpaid/paid)\n"
"- Sender Name and Email\n"
"- Recipient Name and Email\n"
"- Items (description, quantity, rate)\n"
"- Total Amount\n"
"- Memo\n\n"
"Invoice Text:\n"
f"{invoice_text}"
)
response = client.messages.create(
model="claude-3-5-sonnet-20240620",
max_tokens=500,
temperature=0,
messages=[{"role": "user", "content": prompt}],
)
return response.content[0].text
if __name__ == "__main__":
pdf_path = "invoice_sample.pdf"
invoice_text = extract_text_from_pdf(pdf_path)
parsed_data = parse_invoice_with_claude(invoice_text)
print(parsed_data)
Step 3: Output
Running python anthropic-main.py produces:
{
"invoiceNumber": "2-7-25",
"invoiceDate": "July 2, 2025",
"dueDate": "Upon receipt",
"invoiceStatus": "UNPAID",
"senderName": "Anmol Baranwal",
"senderEmail": "hi@anmolbaranwal.com",
"recipientName": "Anmol Baranwal",
"recipientEmail": "anmolbaranwal09@gmail.com",
"items": [
{ "description": "Testing", "quantity": 1, "rate": 50.00 },
{ "description": "Development", "quantity": 1, "rate": 100.00 },
{ "description": "Blog", "quantity": 1, "rate": 50.00 }
],
"totalAmount": 200.00,
"memo": "Thank you! This is a sample invoice for testing document parsing with AI models."
}
Pros:
- Claude 3.5 is strong at understanding long text and formatting it cleanly
- Can handle text and images in prompts
- Sometimes handles unusual or long documents better than GPT-4
Cons:
- Requires prompt engineering like GPT
- Can miss fields or hallucinate values
- Returns raw JSON text without validation
- Must manually extract PDF text first
Cost estimate: $0.005–$0.018 per document. Response time: 200–300ms best case, up to 10+ seconds for larger prompts.
Using the Invofox API
Invofox is a Y Combinator-backed startup built specifically for document parsing. It uses specialized models tuned for invoices and other documents, unlike general-purpose LLMs.
Key features include:
Splitter: Automatically separates multiple documents in a single PDF (e.g., mixed invoices), grouping pages into logical sub-documents for better extraction.
Classifier: Pretrained AI model that detects document types (invoice, receipt, etc.) so each document is processed using the correct schema.
Advanced AI models with proprietary algorithms verify and autocomplete data.
Step 1: Sign up for the dashboard
Create an account to generate an API key for the Invofox dashboard.
Step 2: Creating the request in Postman
Once you have your API key, use Postman to send documents to Invofox’s /uploads endpoint.
1. Create a new request
- Open Postman and create a collection with a new request
- Set method to
POST - Request endpoint:
https://api.invofox.com/v1/ingest/uploads
2. Set the headers
Add these headers:
accept:application/jsonx-api-key:your_invofox_api_key
Do not manually set Content-Type; Postman handles it automatically with form-data.
3. Add the body (form-data)
Switch to Body tab, select form-data and add:
- key:
files, type:file, value: upload your invoice PDF - key:
info, type:text, value:
{
"type": "6840c4511cbcc77119347248",
"data": {
"companyActsLike": "issuer"
}
}
The data field is optional for custom metadata or parsing instructions.
4. Send the request
Clicking “Send” returns a response with details:
{
"accountId": "683edb9d7ded4695232c4979",
"environmentId": "683edb9d7ded4695232c497b",
"importId": "68662d83c3a0849a86a6aa30",
"files": [
{
"id": "68662d83c3a0849a86a6aa33",
"filename": "invoice_sample.pdf",
"documentId": "68662d83c3a0849a86a6aa34"
}
]
}
importID: Batch ID for tracking multiple uploadsdocumentID: ID of the parsed document
Step 3: Get parsed document
Make a GET request to https://api.invofox.com/documents/{documentID} with headers:
accept:application/jsonx-api-key:your_invofox_api_key
The response includes the original document image and extracted data with many additional fields compared to GPT or Claude outputs.
Python code using Invofox API
Install the requests library:
pip install requests
Create invofox-main.py:
import requests
import os
import json
from dotenv import load_dotenv
import time
load_dotenv()
API_BASE = "https://api.invofox.com"
API_KEY = os.getenv("INVOFOX_API_KEY")
PDF_PATH = "invoice_sample.pdf"
headers = {"accept": "application/json", "x-api-key": API_KEY}
with open(PDF_PATH, "rb") as f:
files = {"files": f}
info = {"type": "6840c4511cbcc77119347248", "data": {"companyActsLike": "issuer"}}
data = {"info": json.dumps(info)}
resp_upload = requests.post(
f"{API_BASE}/v1/ingest/uploads", headers=headers, files=files, data=data
)
upload_result = resp_upload.json()
print("Upload response:", upload_result)
import_id = upload_result.get("importId")
if not import_id:
raise ValueError("Import ID not found in upload response.")
# wait a moment for processing
time.sleep(2)
resp_import = requests.get(f"{API_BASE}/v1/ingest/imports/{import_id}", headers=headers)
import_info = resp_import.json()
print("Import info:", import_info)
files_info = import_info.get("files", [])
if not files_info or not files_info[0].get("documentIds"):
raise ValueError("Document IDs not found in import info.")
document_id = files_info[0]["documentIds"][0]
print("Document ID:", document_id)
time.sleep(20)
resp_get = requests.get(f"{API_BASE}/documents/{document_id}", headers=headers)
parsed_doc = resp_get.json()
print("Parsed Document Data:")
print(json.dumps(parsed_doc, indent=2))
The three main API endpoints used:
POST /v1/ingest/uploads→ Uploads a PDF with metadata, returnsimportIdGET /v1/ingest/imports/{importId}→ RetrievesdocumentIdsfromimportIdGET /documents/{documentId}→ Retrieves fully parsed invoice data
Running python invofox-main.py returns the parsed document with all fields correctly extracted and validated.
Results & comparison
API call methods:
- GPT-4o/Claude → Send text with prompt
- Invofox → Use API or upload file (image/PDF) in bulk
Setup:
- GPT/Claude → Requires prompt engineering code
- Invofox → Minimal code, no prompt needed
Validation:
- GPT/Claude → Manual verification required
- Invofox → Built-in validation and confidence scores
Performance:
- GPT/Claude → Limited by token/window size
- Invofox → Handles multi-page docs via backend OCR and AI
Key findings:
- ChatGPT (GPT-4o): Good at parsing known fields if prompted clearly. You get JSON but must parse/clean it. Errors occur if prompts are unclear.
- Claude 3.5 (Sonnet): Similar to GPT-4. Handles invoice fields well, sometimes better at recognizing unfamiliar terms. Still requires prompt massaging.
- Invofox API: Returns fully parsed invoice JSON out-of-the-box. All fields correctly extracted and validated with exactly the needed schema and no extra coding.
Comparison table
| Feature | Invofox | GPT-4o | Claude 3.5 Sonnet |
|---|---|---|---|
| No-code setup | ✅ | ❌ (code/prompt required) | ❌ (code/prompt required) |
| Direct PDF/image upload | ✅ | ❌ (text-only) | ❌ (text-only) |
| Custom document type support | ✅ (custom IDs) | ✅ (via prompt) | ✅ (via prompt) |
| Fixed JSON schema | ✅ | ❌ (varies by prompt) | ❌ (varies by prompt) |
| Consistent field naming | ✅ | ❌ (“Line Items” variants) | ❌ |
| Built-in validation | ✅ | ❌ | ❌ |
| Confidence scores | ✅ | ❌ | ❌ |
| Auto-completion | ✅ | ❌ | ❌ |
| Human review optional | ✅ | ❌ | ❌ |
| Workflow orchestration | ✅ (API/hooks/dashboard) | ❌ | ❌ |
| Multi-language OCR | ✅ | ❌ (needs external OCR) | ❌ (needs external OCR) |
| Multi-page document support | ✅ | ❌ | ❌ |
| Processing speed | ✅ (fast, <5s) | ✅ (fastest) | ⚠️ (slower) |
Cost & execution time benchmarks
| Tool | Cost Structure | Typical Cost (per doc) | Typical Execution Time | Notes |
|---|---|---|---|---|
| Invofox | Per document, usage-based, no fixed fees | Custom / Not public. (Free trial, then custom) | <5s per document | Built for production, price via sales |
| GPT-4o | $2.50/million input tokens, $10/million output | $0.005–$0.015 (1,000–2,000 tokens) | 1–30s per request, can vary by load | Price = (input + output tokens) × pricing |
| Claude 3.5 Sonnet | $3/million input tokens, $15/million output | $0.006–$0.018 (1,000–2,000 tokens) | ~200–300ms best-case, up to 10s+ | API is fast, but batch/huge prompts increase time |
Consider ongoing costs of upgrading language models. Teams often need to benchmark new models, retest prompts, adjust schemas, and modify parsing logic when new versions release. These hidden maintenance costs are significant. With Invofox, no such requirement exists.
Bottom line
For quick experiments or one-off tasks, GPT-4 (ChatGPT API) or Claude Sonnet work reasonably well by crafting suitable prompts. They produce structured JSON output competently.
However, for reliable production-grade parsing of invoices or receipts, the Invofox API is superior. It’s specifically built for documents using advanced proprietary models and continual feedback loops.
Complete code is available in the GitHub Repository.