Claude Batch API Guide 2026: Process Thousands of Prompts at 50% Lower Cost
Running the Claude API at scale gets expensive fast. If you're processing hundreds or thousands of prompts for data analysis, content generation, or document classification, the standard API bill adds up quickly. The Claude Batch API is Anthropic's answer: send large volumes of requests asynchronously and get results at 50% lower cost per token.
This guide explains how the Batch API works, when to use it, and includes copy-paste Python and TypeScript examples so you can start saving immediately.
What is the Claude Batch API?
The Claude Batch API is an asynchronous endpoint that lets you submit large groups of AI requests at once and retrieve results when they're ready — at 50% lower cost than the standard API. It is designed for workloads where users are not waiting for an instant response.
The standard Claude API is synchronous — you send a request, and you wait for a response before sending the next one. This is perfect for chatbots and interactive tools. But if you're running AI across 10,000 product descriptions, you don't need instant results. You just need results.
The Batch API changes the model: you queue up all your requests in one call, Anthropic processes them in the background, and you retrieve the results when they're done. The trade-off is latency (up to 24 hours) — and the reward is a 50% discount on every token.
How It Works — Step by Step
- Create a batch: POST a list of up to 10,000 requests to the Batch API endpoint.
- Receive a batch ID: Anthropic queues your job and returns a unique batch ID immediately.
- Poll for status: Check the batch status periodically using the batch ID.
- Retrieve results: When the batch is complete, download results using the retrieval endpoint.
- Process output: Each result maps back to your custom
custom_idso you know which response matches which request.
custom_id to each request (like a product ID or row number).
This makes it easy to map results back to your dataset.
Batch API Pricing Comparison
| Model | Standard Input | Batch Input | Standard Output | Batch Output |
|---|---|---|---|---|
| Claude Opus 4 | $15 / MTok | $7.50 / MTok | $75 / MTok | $37.50 / MTok |
| Claude Sonnet 4 | $3 / MTok | $1.50 / MTok | $15 / MTok | $7.50 / MTok |
| Claude Haiku 4.5 | $0.80 / MTok | $0.40 / MTok | $4 / MTok | $2 / MTok |
MTok = 1 million tokens. Batch pricing is always exactly 50% of standard pricing for both input and output tokens.
Real Cost Example
Suppose you need to classify 50,000 customer support tickets using Claude Sonnet 4. Each ticket is ~200 tokens in, ~50 tokens out.
- Standard API: (50,000 × 200 × $3 + 50,000 × 50 × $15) / 1,000,000 = $67.50
- Batch API: (50,000 × 200 × $1.50 + 50,000 × 50 × $7.50) / 1,000,000 = $33.75
- Savings: $33.75 per run
Best Use Cases for the Batch API
| Use Case | Why Batch Works | Example |
|---|---|---|
| Content generation at scale | No user waiting for output | Generate 5,000 product descriptions overnight |
| Document classification | Offline, background processing | Classify 100,000 support emails by category |
| Data enrichment | Large dataset, no real-time need | Add AI-generated summaries to a CRM database |
| Sentiment analysis | Batch analytics jobs | Analyze 20,000 customer reviews per week |
| Translation pipelines | Scheduled overnight runs | Translate product catalog to 8 languages |
| Training data generation | Generate synthetic examples for ML models | Create 10,000 Q&A pairs from documents |
Python Example
Install the Anthropic SDK: pip install anthropic
Step 1 — Create a Batch
import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
# Build a list of requests
requests = []
products = [
{"id": "prod_001", "name": "Wireless Headphones", "specs": "40mm drivers, ANC, 30h battery"},
{"id": "prod_002", "name": "Laptop Stand", "specs": "Aluminium, adjustable height, 10kg capacity"},
{"id": "prod_003", "name": "USB-C Hub", "specs": "7-in-1, 4K HDMI, 100W PD, USB 3.0"},
]
for product in products:
requests.append({
"custom_id": product["id"],
"params": {
"model": "claude-sonnet-4-5",
"max_tokens": 150,
"messages": [{
"role": "user",
"content": f"Write a compelling 2-sentence product description for: {product['name']}. Specs: {product['specs']}"
}]
}
})
# Submit the batch
batch = client.messages.batches.create(requests=requests)
print(f"Batch created: {batch.id}")
print(f"Status: {batch.processing_status}")
Step 2 — Poll for Completion
import time
batch_id = batch.id
while True:
batch_status = client.messages.batches.retrieve(batch_id)
status = batch_status.processing_status
if status == "ended":
print("Batch complete!")
break
elif status in ("errored", "canceled", "expired"):
print(f"Batch failed with status: {status}")
break
print(f"Status: {status} — waiting 30 seconds...")
time.sleep(30)
Step 3 — Retrieve Results
results = {}
for result in client.messages.batches.results(batch_id):
custom_id = result.custom_id
if result.result.type == "succeeded":
text = result.result.message.content[0].text
results[custom_id] = text
print(f"{custom_id}: {text[:80]}...")
else:
print(f"{custom_id}: Error — {result.result.error.type}")
TypeScript Example
Install: npm install @anthropic-ai/sdk
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
async function runBatch() {
// Create batch
const batch = await client.messages.batches.create({
requests: [
{
custom_id: "email_001",
params: {
model: "claude-sonnet-4-5",
max_tokens: 200,
messages: [{ role: "user", content: "Summarize this support email in one sentence: 'My order arrived broken and I need a replacement ASAP'" }],
},
},
{
custom_id: "email_002",
params: {
model: "claude-sonnet-4-5",
max_tokens: 200,
messages: [{ role: "user", content: "Summarize this support email in one sentence: 'I was charged twice for the same subscription'" }],
},
},
],
});
console.log("Batch ID:", batch.id);
// Poll until complete
let status = batch.processing_status;
while (status !== "ended") {
await new Promise((r) => setTimeout(r, 10000));
const updated = await client.messages.batches.retrieve(batch.id);
status = updated.processing_status;
console.log("Status:", status);
}
// Retrieve results
for await (const result of await client.messages.batches.results(batch.id)) {
if (result.result.type === "succeeded") {
const text = result.result.message.content[0];
if (text.type === "text") {
console.log(`${result.custom_id}:`, text.text);
}
}
}
}
runBatch();
Limits and Constraints
| Limit | Value |
|---|---|
| Max requests per batch | 10,000 |
| Max batch size | 256 MB |
| Max processing time | 24 hours |
| Result expiry | 29 days after creation |
| Concurrent batches | No hard limit (subject to rate limits) |
| Supported models | All Claude 3.x and Claude 4.x models |
Batch API vs Standard API
| Factor | Standard API | Batch API |
|---|---|---|
| Response time | Real-time (seconds) | Async (minutes to 24 hours) |
| Cost | Full price | 50% discount |
| Max requests per call | 1 | 10,000 |
| Best for | Chatbots, live search, interactive tools | Data pipelines, bulk processing, scheduled jobs |
| Streaming support | Yes | No |
| Result retrieval | Immediate | Poll or webhook |
Tips for Best Results
- Always set a unique custom_id. Use your database primary key or a UUID so you can match results back easily.
- Start small. Test with 10–50 requests before scaling to 10,000. Confirm your prompts produce the right output format first.
- Handle errors per request. Individual requests in a batch can fail without failing the whole batch. Check each result's type.
- Use Claude Haiku for high-volume, simple tasks. At $0.40 / MTok input (batch), it's the most cost-effective option for classification and short-form generation.
- Download results before day 29. Set a reminder — results expire and are deleted after 29 days.
- Combine with prompt caching. If many requests share a long system prompt, add prompt caching to reduce costs further. See our prompt caching guide.
Need Help Building AI Pipelines for Your Business?
At Mayank Digital Labs, we build custom AI automation systems — from Claude API integrations and batch processing pipelines to n8n workflows and CRM automation. Whether you're a startup or an established brand, we build systems that get real results.
No commitment. Just a 30-minute call to see how we can help.
Frequently Asked Questions
What is the Claude Batch API?
The Claude Batch API is an asynchronous endpoint that lets you submit up to 10,000 AI requests at once and retrieve results later — at 50% lower cost than the real-time standard API. It's designed for non-interactive, background processing workloads.
How much cheaper is the Claude Batch API?
The Batch API costs exactly 50% less per token for both input and output. For example, Claude Sonnet 4 standard costs $3 per million input tokens; batch costs $1.50 per million input tokens.
How long do batch results take?
Results are available within 24 hours, but most batches complete much faster — sometimes within minutes for smaller jobs. You check the status by polling the API with your batch ID.
When should I NOT use the Batch API?
Don't use the Batch API for real-time applications like chatbots, live search, or any feature where a user is actively waiting for a response. Use the standard API for those cases.
What is the maximum batch size?
Each batch can contain up to 10,000 requests with a total size limit of 256 MB. You can run multiple batches in parallel to handle larger datasets.