Amazon Bedrock Pricing: Token Rates Hide a $350/Month Trap

If you're scoping an AI project on AWS, the first question is always the same: how much will Amazon Bedrock actually cost? The answer is more complicated than the per-token rates suggest.

The problem most teams run into: they budget for model inference tokens and get blindsided by a $350/month OpenSearch Serverless bill for their Knowledge Base, agent calls that consume 5-10x the expected tokens, or CloudWatch logging costs that spiral from verbose prompt storage. I've seen this pattern repeatedly, and it's almost always preventable.

This guide covers every Amazon Bedrock pricing dimension, from per-token model costs to the hidden infrastructure charges that cause bill shock, with real dollar amounts throughout. All pricing data verified as of February 2026. I also maintain an up-to-date Amazon Bedrock Pricing Calculator so you can estimate your specific monthly costs after reading this.

How Amazon Bedrock Pricing Works

Before diving into specific model rates, you need a mental model for how Bedrock costs add up. Most pricing guides jump straight to per-token tables, but that only tells a third of the story.

Bedrock uses a token-based, pay-per-use pricing model. A token is approximately 4 characters or 0.75 words in English. All text model pricing is quoted per 1 million tokens. Input tokens (your prompt) are always cheaper than output tokens (the model's response), typically 3-5x cheaper. This asymmetry exists because generating output requires significantly more compute than processing input.

There are three distinct cost layers that make up your total Bedrock bill:

Layer 1 is what everyone covers: the per-token cost of calling a foundation model. Layer 2 is where teams get surprised: Knowledge Bases, Agents, Guardrails, and Flows each add their own charges on top of model inference. Layer 3 is the operational tail: logging, storage, and data transfer costs that accumulate silently.

Understanding all three layers is the difference between an accurate budget and a surprise bill. Let's start with a detail most guides skip.

Token-Based Pricing Explained

Every time you send a request to a Bedrock model, you're charged for two things: the tokens in your prompt (input) and the tokens in the model's response (output). Pricing is always quoted per 1 million tokens (per 1M tokens).

Here's what you need to know:

Input tokens include your system message, user message, and any context you provide
Output tokens are what the model generates in its response
Extended thinking tokens (used by models like Claude with chain-of-thought reasoning) are billed as output tokens, not as a separate tier
Image models are priced per image generated or processed, not per token
Embedding models charge for input tokens only (no output)

Input vs. Output Token Costs

The pricing gap between input and output tokens matters for cost planning. For example, Claude Sonnet 4.5 charges $3.00 per 1M input tokens but $15.00 per 1M output tokens, a 5x difference. If your application generates long responses (code generation, detailed analysis), output tokens will dominate your bill.

Some models also charge more for extended context windows. Claude models with 1M token context windows bill inputs beyond 200K tokens at approximately 2x the standard rate. This is worth knowing if you're building RAG applications that stuff large documents into the context.

Does Bedrock Have a Free Tier?

Amazon Bedrock does not have a permanent free tier. You pay from the first API call.

However, new AWS accounts created after July 15, 2025 receive $200 in credits ($100 on sign-up plus $100 for completing guided activities, which include a Bedrock text playground tutorial). These credits are usable across 200+ AWS services including Bedrock and expire after 6 months. If you're evaluating Bedrock, this is a good way to test without upfront cost.

Now that you understand the pricing model, let me show you the costs that most guides skip entirely: the hidden infrastructure charges that catch teams off guard.

Hidden Costs You Need to Know Before Building

This section is the reason I wrote this guide. Every competitor article mentions hidden costs in a paragraph or two, but none of them give you the actual dollar amounts. Here's the full picture.

When you move beyond basic model inference and start using Bedrock features like Knowledge Bases, Agents, and Guardrails, you trigger additional charges that can easily exceed your inference bill. Understanding these upfront, before you architect your solution, is the best way to avoid bill shock.

Hidden Cost Component	Minimum Monthly Floor	Mitigation Strategy
OpenSearch Serverless (Knowledge Base)	~$350/month (2 OCUs)	Switch to Amazon S3 Vectors (up to 90% cheaper)
Agent cost multiplication	5-10x your naive token estimate	Build focused, single-purpose agents
Guardrails (content + denied topics)	$0.30/1K text units combined	Enable only the filters you need
Flows	$0.035/1K node transitions	Minimize node count per workflow
CloudWatch Logging	$0.50/GB ingestion	Disable full prompt logging in production
Reranking (Amazon Rerank 1.0)	$1.00/1K queries	Use only when retrieval quality requires it

Knowledge Base Costs (OpenSearch Serverless Floor)

Knowledge Bases for Bedrock don't have a separate KB fee. The costs come from the underlying components: embedding model inference, vector database storage, and the foundation model for RAG responses.

The catch is the vector database. If you use Amazon OpenSearch Serverless (the default option), you'll pay a minimum of approximately $350 per month for the required 2 OCUs (OpenSearch Compute Units) at ~$0.24 per OCU per hour, regardless of how many queries you run.

Here's the good news: Amazon S3 Vectors, launched in December 2025, is up to 90% cheaper and supports trillions of vectors with sub-second latency. If you're starting a new Knowledge Base project, S3 Vectors should be your default choice. For existing deployments, the migration is worth the effort.

You can estimate your OpenSearch infrastructure costs specifically with our OpenSearch pricing calculator.

Additional Knowledge Base costs to budget for:

Reranking with Amazon Rerank 1.0: $1.00 per 1,000 queries (up to 100 document chunks per query)
Structured data retrieval: $0.002 per GenerateQuery API call
Data Automation for document parsing: $0.01 per page (standard) or $0.04 per page (custom output)

Agent Cost Multiplication

This is the hidden cost that catches the most teams off guard. Legacy Bedrock Agents don't charge a per-invocation fee (the InvokeAgent API itself is free, unlike AgentCore's full-stack billing model), but a single user query triggers multiple internal model invocations:

Planning step: the model determines what to do
Tool selection: the model selects which APIs to call
Tool execution: each tool call may involve additional model calls
Summarization: the model synthesizes results into a final response

The result? A single user query can consume 5-10x the tokens you'd estimate from looking at the prompt and response alone. The internal reasoning traces and multi-step processing are all billed at standard token rates.

How to mitigate: Build focused, single-purpose agents rather than Swiss-army-knife agents that try to do everything. Fewer tools means fewer internal reasoning steps.

Guardrails Pricing

Guardrails add safety filters to your model inputs and outputs. Pricing is per filter, per 1,000 text units (where 1 text unit = up to 1,000 characters):

Filter	Price per 1,000 Text Units
Content filters (text)	$0.15
Denied topics	$0.15
Sensitive information (PII, ML-based)	$0.10
Contextual grounding checks	$0.10
Automated Reasoning checks	$0.17 per policy
Word/regex filters	Free

These prices were reduced by 80-85% in December 2024, which makes Guardrails more practical at scale. But they still add up. A customer support chatbot processing 1,000 queries per hour with content filters and denied topics enabled costs approximately $0.90 per hour in Guardrails charges alone.

Important billing detail: if the guardrail blocks the input prompt, you pay for the guardrail evaluation only (no model inference charge). If it blocks the model response, you pay for both guardrail evaluation and model inference.

Flows and Additional Feature Costs

Bedrock Flows (the visual workflow builder for multi-step AI pipelines) charges $0.035 per 1,000 node transitions, plus standard FM inference costs for any model calls within the flow.

For a simple news summarization flow with 25 node transitions running 960 times per month, the flow transition cost is only $0.84/month, but the model inference within those flows is billed separately.

CloudWatch Logging Costs

This one is easy to overlook. If you enable full model invocation logging (including prompts and responses), Bedrock can generate hundreds of gigabytes of log data. CloudWatch Logs charges $0.50 per GB for ingestion.

My recommendation: disable full prompt and response logging in production unless you need it for compliance or debugging. If you do need it, set a CloudWatch Logs retention policy to avoid indefinite storage costs.

With the full cost picture in mind, let's look at what you'll pay for model inference, the core of your Bedrock bill.

On-Demand Model Pricing by Provider

On-demand pricing is what you pay per token without any commitment or volume discounts. This is how most teams start with Bedrock, and the price range is enormous: from $0.035 per 1M input tokens (Nova Micro) to $75.00 per 1M output tokens (Claude Opus 4.6), a 2,000x difference.

The model you choose is the single biggest cost lever you have. Let me walk through the key providers, organized from budget-friendly to premium.

Amazon Nova Models (Budget-Friendly)

Amazon's own Nova models are designed for price-performance leadership. They're the cheapest option for most use cases on Bedrock.

Model	Context Window	Input (per 1M tokens)	Output (per 1M tokens)	Best For
Nova Micro	128K	$0.035	$0.14	Simple classification, routing, extraction
Nova Lite	300K	$0.06	$0.24	General Q&A, summarization, multimodal
Nova Pro	300K	$0.80	$3.20	Complex reasoning, coding, multi-step tasks
Nova Premier	1M	$2.50	$12.50	Frontier intelligence, 1M context

Nova Micro is text-only, while Nova Lite, Pro, and Premier support multimodal input (text, image, and video). All Nova models support Standard, Priority, Flex, and Batch tiers, with batch pricing at 50% of on-demand.

Nova Act, Amazon's agent-oriented model, uses a different pricing model: $4.75 per agent hour (elapsed real-world time while an agent is working, excluding human-in-the-loop wait time).

Anthropic Claude Models

Claude models are the most popular choice on Bedrock for complex reasoning and coding tasks. Here are the current prices:

Model	Input (per 1M tokens)	Output (per 1M tokens)	Batch Input	Batch Output
Claude Opus 4.6	$15.00	$75.00	$7.50	$37.50
Claude Sonnet 4.5/4.6	$3.00	$15.00	$1.50	$7.50
Claude Opus 4.5	$5.00	$25.00	$2.50	$12.50
Claude Haiku 4.5	$1.00	$5.00	$0.50	$2.50
Claude 3 Haiku	$0.25	$1.25	$0.125	$0.625

Long context pricing: For Claude models with 1M context windows, inputs beyond 200K tokens are billed at 2x the standard rate. For example, Claude Sonnet 4.5 charges $6.00 per 1M tokens for input beyond the 200K threshold.

Extended Access pricing: Legacy models like Claude 3.5 Sonnet (now in Public Extended Access) have increased from $3/$15 to $6/$30 per million tokens as of December 2025. If you're still using these, migration to Claude Sonnet 4.5 gives you the same price with better performance.

Prompt caching is available for Claude models with two TTL options: 5-minute (1.25x write, 0.1x read) and 1-hour (2.0x write, 0.1x read). I'll cover caching math in the optimization section.

Meta Llama, Mistral, and Open-Source Models

Open-source models on Bedrock offer competitive pricing with no vendor lock-in on the model weights:

Model	Input (per 1M tokens)	Output (per 1M tokens)
Mistral Large 3	$0.50	$1.50
Magistral Small 1.2	$0.50	$1.50
Devstral 2 135B	$0.40	$2.00
Ministral 3B	$0.10	$0.10
DeepSeek v3.2	$0.62	$1.85
Llama 2 Chat (13B)	$0.75	$1.00
Llama 2 Chat (70B)	$1.95	$2.56

Mistral's smaller models (Ministral 3B at $0.10/$0.10, Voxtral Mini at $0.04/$0.04) are excellent budget options for simple tasks. DeepSeek v3.2, added in February 2026, offers strong reasoning at $0.62/$1.85.

All Other Providers (DeepSeek, Google, NVIDIA, Qwen, and More)

Bedrock now hosts models from 15+ providers. Here's a curated selection of the newest and most cost-effective options:

Provider	Model	Input (per 1M tokens)	Output (per 1M tokens)
Google	Gemma 3 4B	$0.04	$0.08
Google	Gemma 3 27B	$0.23	$0.38
NVIDIA	Nemotron Nano 2	$0.06	$0.23
OpenAI (GPT-OSS)	GPT-OSS 20B	$0.07	$0.30
OpenAI (GPT-OSS)	GPT-OSS 120B	$0.15	$0.60
Qwen	Qwen3 32B	$0.15	$0.60
Qwen	Qwen3 Coder Next	$0.50	$1.20
MiniMax AI	MiniMax M2.1	$0.30	$1.20
Moonshot AI	Kimi K2.5	$0.60	$3.00
Z AI	GLM 4.7	$0.60	$2.20
Z AI	GLM 4.7 Flash	$0.07	$0.40
Writer	Palmyra X5	$0.60	$6.00

Google Gemma 3 4B at $0.04/$0.08 is one of the cheapest models on Bedrock (alongside Voxtral Mini at $0.04/$0.04 and Nova Micro at $0.035/$0.14). Six of these models (DeepSeek V3.2, MiniMax M2.1, GLM 4.7, GLM 4.7 Flash, Kimi K2.5, Qwen3 Coder Next) were added in February 2026.

For the exhaustive model list, see the official Amazon Bedrock pricing page.

Image and Multimodal Model Pricing

Stability AI provides image generation and editing models on Bedrock, priced per operation:

Operation	Price per Image
Fast Upscale	$0.03
Outpaint	$0.06
Remove Background / Erase Object / Control / Inpaint	$0.07
Style Transfer	$0.08
Conservative Upscale	$0.40
Creative Upscale	$0.60

Amazon Nova Canvas (image generation) and Nova Reel (video generation) are also available with per-image and per-second-of-video pricing respectively.

Those are the standard on-demand rates, but Bedrock offers several pricing tiers that can significantly reduce your costs. Let me walk you through when each one makes sense.

Bedrock Service Tiers: Standard, Priority, Flex, and Batch

Beyond on-demand pricing, Bedrock offers multiple tiers that let you trade flexibility for savings (or pay a premium for guaranteed performance). These tiers were formalized in February 2025, and most competitor guides still don't cover them.

Understanding which tier to use can save you 50% or more on your inference costs. Here's how to decide:

Standard Tier (Default)

The Standard tier is the default for all Bedrock inference. You get consistent performance at the base rates listed in the pricing tables above. This is the right choice for most production workloads where you need reliable latency without paying a premium.

Priority Tier (75% Premium for Low Latency)

The Priority tier provides preferential compute allocation with up to 25% better output-tokens-per-second latency at a 75% price premium over Standard.

Use Priority only for mission-critical, latency-sensitive applications where response time directly impacts user experience or revenue. For most workloads, the 75% premium isn't justified.

Example: Claude Sonnet 4.5 at Priority tier costs $5.25/$26.25 per 1M tokens instead of $3.00/$15.00 at Standard.

Flex Tier (50% Discount for Non-Urgent Work)

The Flex tier is one of the best cost optimization levers available. You get a 50% discount from Standard rates in exchange for trading immediacy. The key difference from batch: Flex is accessible through the regular Converse and InvokeModel APIs, so you don't need to restructure your application to use file-based batch processing.

Flex is ideal for evaluations, background summarizations, non-urgent agentic tasks, and any workload where a few seconds of additional latency is acceptable.

Batch Inference (50% Discount for Async Processing)

Batch inference gives you the same 50% discount as Flex, but processes requests asynchronously from a file. You submit a batch job and get results when processing completes.

Best for: large-scale document processing, data enrichment, model evaluations, and any workload that doesn't need real-time responses.

Model	Standard Input/Output	Batch Input/Output
Claude Sonnet 4.5	$3.00 / $15.00	$1.50 / $7.50
Claude Haiku 4.5	$1.00 / $5.00	$0.50 / $2.50
Nova Lite	$0.06 / $0.24	$0.03 / $0.12
Nova Pro	$0.80 / $3.20	$0.40 / $1.60

Batch and prompt caching discounts can be combined for multiplicative savings. More on that in the optimization section.

Provisioned Throughput (Reserved Capacity)

Provisioned Throughput reserves dedicated model capacity at fixed hourly rates per Model Unit (MU). This is the enterprise option for very high, consistent inference volumes.

Commitment options: no commitment, 1-month, or 6-month (longer commitments cost less). Monthly costs range from $15,000 to $37,000+ per MU depending on the model and commitment term.

Model	1-Month (per MU/hr)	Monthly Cost (1 MU)
Meta Llama 2	$21.18	~$15,758
Amazon Titan Text Express (2 MUs)	$18.40	~$27,379
Cohere Command	$39.60	~$29,462

Provisioned Throughput only makes sense when your per-token costs at on-demand rates consistently exceed the hourly MU cost. Run a breakeven analysis before committing.

For the full tier comparison and model availability, see the Bedrock service tiers documentation.

If you need a customized model rather than the base versions, here's what that costs.

Fine-Tuning and Model Customization Costs

Fine-tuning a foundation model on your own data involves three cost components: training, storage, and inference on the customized model.

This isn't something every team needs, but when you do, it helps to understand the full cost picture upfront.

Training Costs

Fine-tuning is billed per token trained (total tokens in your training data multiplied by the number of epochs):

Model	Price per 1,000 Training Tokens
Amazon Titan Text Lite	$0.001
Amazon Titan Text Express	$0.008
Meta Llama 2 (13B)	$0.00149
Meta Llama 2 (70B)	$0.00799
Cohere Command	$0.004

Reinforcement Fine-Tuning (RFT) uses hourly billing instead of per-token: $80 per hour for GPT-OSS 20B and Qwen3 32B models.

Custom Model Storage and Inference

Every custom model costs $1.95 per month for storage. Inference on custom models typically requires provisioned throughput, though some models support on-demand pricing.

Full cost example: Fine-tuning Titan Text Express with 20M tokens over 3 epochs:

Training: (60M tokens / 1K) x $0.008 = $480 one-time
Storage: $1.95/month
Inference (1-month commitment, 1 MU): $13,248/month
Inference (6-month commitment, 1 MU): $10,656/month

Model Distillation (Cheaper Alternative)

If full fine-tuning seems expensive, consider model distillation. This transfers knowledge from a larger "teacher" model to a smaller "student" model, and the results are impressive: distilled models can be up to 500% faster and 75% less expensive with less than 2% accuracy loss.

Distillation costs include synthetic data generation (at on-demand teacher model rates) and student model training. It's been generally available since May 2025 and supports pathways like Nova Premier to Nova Pro, and Claude 3.5 Sonnet v2 to Llama 3.2 1B/3B.

Model evaluation is also worth noting: algorithmic scores (relevance, coherence, fluency) are free. Human evaluation costs $0.21 per completed task. LLM-as-a-judge evaluation charges at standard token rates for the judge model.

All these per-token rates and pricing tiers can feel abstract. Let me show you what Bedrock actually costs for real-world use cases.

What Does Bedrock Actually Cost? Real-World Scenarios

This is the section I wish existed when I first started budgeting Bedrock projects. No competitor provides use-case cost projections, so here are three realistic scenarios with monthly estimates.

These estimates use US East region pricing at Standard tier. Your actual costs will vary based on configuration, traffic patterns, and model choice.

RAG Chatbot (Knowledge Base + Claude Haiku 4.5)

Assumptions: 10,000 queries per day, ~2,000 input tokens + 500 output tokens per query, using Claude Haiku 4.5 for generation and Titan Embeddings V2 for embedding.

Cost Component	Monthly Estimate
Claude Haiku 4.5 inference (300K queries: 600M input + 150M output tokens)	$600 input + $750 output = $1,350
Titan Embeddings V2 (600M input tokens for embedding queries + chunks)	$12
Vector database (S3 Vectors)	~$35
Vector database (OpenSearch Serverless, if used instead)	~$350
Reranking (optional, 300K queries)	$300
Total (with S3 Vectors, no reranking)	~$1,397/month
Total (with OpenSearch + reranking)	~$2,012/month

The vector database choice alone creates a $315/month difference. This is why I recommend S3 Vectors as the default for new projects.

Document Summarization Pipeline (Batch + Nova Lite)

Assumptions: 50,000 documents per day, ~4,000 input tokens + 1,000 output tokens per document, using Nova Lite with batch inference (50% discount).

Cost Component	Monthly Estimate
Nova Lite batch input (1.5M docs x 4K tokens = 6B tokens)	6,000 x $0.03 = $180
Nova Lite batch output (1.5M docs x 1K tokens = 1.5B tokens)	1,500 x $0.12 = $180
S3 storage for batch files	~$5
Total	~$365/month

Using batch inference cuts the cost in half compared to on-demand ($730/month). If you used Nova Micro instead of Nova Lite for simpler documents, the inference cost drops to approximately $100/month.

Customer Service Agent (Multi-Step Workflow)

Assumptions: 5,000 interactions per day, using Claude Sonnet 4.5 with 6 internal model calls per interaction (agent planning, tool selection, execution, summarization), ~3,000 input + 800 output tokens per call, Guardrails with content filters and denied topics enabled.

Cost Component	Monthly Estimate
Claude Sonnet 4.5 inference (150K interactions x 6 calls = 900K model calls)
- Input: 900K x 3K tokens = 2.7B tokens	2,700 x $3.00 = $8,100
- Output: 900K x 800 tokens = 720M tokens	720 x $15.00 = $10,800
Guardrails (content + denied topics, ~3 text units per interaction)	150K x 3 x $0.30/1K = $135
Total	~$19,035/month

This is the agent cost multiplier in action. A naive estimate based on one model call per interaction would suggest ~$3,200/month. The actual cost with 6 internal calls is roughly 6x higher. Budget accordingly.

Cost Estimation with Our Bedrock Pricing Calculator

These scenarios give you a ballpark, but your actual configuration will differ. Use the Amazon Bedrock Pricing Calculator to estimate costs for your specific model, traffic volume, and feature combination. It covers all models, tiers, and features so you can budget accurately before building.

Now that you know what to expect, here are 8 strategies to bring those costs down.

8 Strategies to Reduce Your Bedrock Costs

Each strategy below includes the savings potential and the specific scenario where it works best. I've ordered them from highest impact to most specialized.

1. Right-Size Your Model Selection

Savings: 50-99% per request. This is the single biggest cost lever. Most teams default to a powerful model like Claude Sonnet 4.5 ($3.00/$15.00) when Nova Micro ($0.035/$0.14) or Nova Lite ($0.06/$0.24) would produce acceptable results for simpler tasks.

Rule of thumb: Start with the cheapest model that meets your quality bar, then escalate only when testing shows the cheaper model falls short. Use Intelligent Prompt Routing ($1.00 per 1,000 requests) to automatically route each request to the cheapest adequate model within a family. AWS reports up to 30% cost reduction without compromising accuracy.

2. Use Batch Inference for Non-Real-Time Workloads

Savings: 50% on all tokens. Any workload that doesn't need real-time responses should use batch inference. Document processing, data enrichment, evaluations, content generation pipelines: all of these can run as batch jobs at half the on-demand price.

3. Enable Prompt Caching for Repeated Context

Savings: up to 90% on repeated input tokens. If your application reuses the same context across multiple requests (system prompts, shared documents, RAG context), prompt caching can dramatically reduce costs.

Here's how the math works for Claude Sonnet 4.5:

Operation	Cost per 1M tokens	Relative to Standard
Standard input	$3.00	1.0x
Cache write (5-min TTL)	$3.75	1.25x
Cache write (1-hour TTL)	$6.00	2.0x
Cache read (hit)	$0.30	0.1x

Breakeven: With 5-minute TTL, if you reuse the same cached context 2 or more times within 5 minutes, caching saves money. The first request costs 1.25x (cache write), but every subsequent hit costs only 0.1x.

Requirements: minimum 1,024 tokens per cache checkpoint, maximum 4 checkpoints, up to 32K cached tokens for Claude (20K for Nova). The 1-hour TTL option launched in January 2026 for Claude Sonnet 4.5, Haiku 4.5, and Opus 4.5.

4. Switch to the Flex Tier for Tolerant Workloads

Savings: 50% from Standard. Flex gives you the same 50% discount as batch but through the regular Converse and InvokeModel APIs. No restructuring needed. If your workload can tolerate slightly higher latency, Flex is the easiest cost optimization to implement.

5. Use Intelligent Prompt Routing

Savings: up to 30%. For $1.00 per 1,000 requests, Intelligent Prompt Routing automatically selects between models in the same family (like Claude Sonnet + Haiku, or Nova Pro + Lite) based on prompt complexity. Simple queries go to the cheaper model; complex ones go to the more capable one.

At 10,000 requests per day, the routing cost is $10/day ($300/month). If routing saves 30% on a $5,000/month inference bill, that's $1,500 saved for $300 spent.

6. Use Cross-Region Inference for Better Throughput

Savings: ~10% on select models (Global CRIS). Cross-region inference routes requests across AWS regions with no additional routing or data transfer fee. The Global option offers approximately 10% savings on select models compared to geographic cross-region inference.

The main trade-off is additional latency (double-digit milliseconds when re-routing occurs). If your workload is latency-tolerant, Global cross-region inference gives you better throughput and modest savings at no extra cost.

7. Optimize Your Prompts

Savings: 10-50% on input tokens. Shorter prompts mean fewer input tokens mean lower costs. This sounds obvious, but I've seen system prompts grow to thousands of tokens with redundant instructions.

Use the Prompt Optimization feature ($0.03 per 1,000 tokens) to automatically generate more efficient prompts. The one-time optimization cost is quickly recouped through ongoing input token savings.

8. Replace OpenSearch Serverless with S3 Vectors

Savings: up to 90% on vector storage. If you're running a Knowledge Base on OpenSearch Serverless and paying the ~$350/month floor, switching to Amazon S3 Vectors (launched December 2025) can cut your vector storage costs by up to 90%. S3 Vectors supports trillions of vectors with sub-second latency, making it suitable for most RAG workloads.

For a shift-left FinOps approach, catch these cost decisions during code review before they go to production.

CloudBurn

Shift-Left Your FinOps Practice

CloudBurn is an open-source policy engine that scans your Terraform and CloudFormation for cost anti-patterns in CI, and discovers waste in your live AWS account. One engine, two modes.

Get CloudBurn on GitHub

One question I hear a lot: is it cheaper to call model APIs directly instead of going through Bedrock?

Bedrock vs. Direct API Pricing

This is a common question for teams evaluating Bedrock, especially for Anthropic Claude models. The short answer: Bedrock's per-token rates for Claude closely match Anthropic's direct API pricing. The real comparison is about total cost of ownership, not just per-token rates.

Bedrock vs. Anthropic API

Claude Sonnet 4.5 on Bedrock costs $3.00/$15.00 per 1M tokens. Anthropic's direct API pricing is comparable. The per-token cost isn't where Bedrock differentiates.

Where Bedrock adds value:

Batch inference (50% discount) and Flex tier (50% discount) are Bedrock-exclusive
Prompt caching with combinable batch discounts
Unified AWS billing with IAM-based access control and VPC integration
Guardrails, Knowledge Bases, and Agents as managed features
Cross-region inference for better throughput at no routing cost

When Bedrock Is the Better Deal

If you're already on AWS, Bedrock is almost always the better deal. You avoid data egress fees, get IAM authentication instead of managing API keys, and have access to cost optimization features (batch, Flex, caching) that reduce your effective per-token cost well below the base rate.

Bedrock also gives you access to 20+ model providers through a single API. If you want to evaluate or switch between Claude, Nova, Llama, Mistral, and DeepSeek, you can do that without integrating separate APIs.

Direct API access might make sense if you need only one model provider, don't use any AWS services, or need features not yet available on Bedrock. But for most AWS teams, Bedrock's infrastructure advantages and cost optimization features make it the more cost-effective choice.

For more details on optimizing your AI infrastructure costs, see the AWS Bedrock cost optimization guide.

Frequently Asked Questions

How does Amazon Bedrock pricing work?

Amazon Bedrock uses a token-based, pay-per-use pricing model. You pay per 1 million tokens processed, with input tokens (your prompt) always cheaper than output tokens (the model's response), typically 3-5x cheaper. There are four service tiers: Standard (base rate), Priority (+75% premium), Flex (50% discount), and Batch (50% discount). Additional costs come from features like Knowledge Bases, Agents, and Guardrails.

Does Amazon Bedrock have a free tier?

Amazon Bedrock does not have a permanent free tier. All model inference is pay-as-you-go from the first request. However, new AWS accounts created after July 15, 2025 receive $200 in credits ($100 on sign-up plus $100 for completing guided activities). These credits are usable across 200+ AWS services including Bedrock and expire after 6 months.

Why are output tokens more expensive than input tokens?

Output generation requires significantly more compute than input processing. The model must generate each token sequentially (auto-regressively), while input tokens can be processed in parallel. This computational asymmetry is reflected in pricing, with output tokens typically costing 3-5x more than input tokens.

What is the cheapest model on Amazon Bedrock?

The cheapest text model on Bedrock is Google Gemma 3 4B at $0.04/$0.08 per 1M input/output tokens. For Amazon's own models, Nova Micro starts at $0.035/$0.14 per 1M tokens. Both are suitable for simple classification, routing, and extraction tasks where frontier intelligence isn't needed.

How much does Amazon Bedrock cost per month?

Monthly costs vary widely based on your model choice, traffic volume, and features used. A RAG chatbot handling 10K queries per day with Claude Haiku 4.5 costs approximately $1,400-$2,000 per month. A document summarization pipeline using Nova Lite with batch inference can run as low as $365 per month. Use the Amazon Bedrock Pricing Calculator for estimates tailored to your specific workload.

How do I reduce my Amazon Bedrock costs?

The most impactful strategies are: right-sizing your model selection (use the cheapest model that meets quality requirements), using batch inference or Flex tier for non-urgent workloads (50% savings each), enabling prompt caching for repeated context (up to 90% savings), and replacing OpenSearch Serverless with S3 Vectors for Knowledge Bases (up to 90% savings on vector storage).

How do I monitor my Bedrock spending?

Use AWS Cost Explorer with the 'Amazon Bedrock' service filter to track spending by model and feature. Set up AWS Budgets alerts for spending thresholds to get notified before costs exceed your target. For granular tracking, enable CloudWatch metrics for per-model token consumption and set up billing alarms for specific cost thresholds.

Wrapping Up

Amazon Bedrock pricing has a lot of dimensions, but once you understand the three cost layers, the pricing decisions become clearer. Here's what to remember:

Key takeaways:

Bedrock uses token-based pricing with input tokens 3-5x cheaper than output. Model costs range from $0.035 (Nova Micro) to $75.00 (Claude Opus 4.6) per 1M output tokens.
Hidden costs can exceed your inference bill. OpenSearch Serverless ($350/month floor), Agent cost multiplication (5-10x tokens), Guardrails, and logging all add up. Budget for the full stack, not just tokens.
Batch inference and Flex tier each offer 50% savings. Prompt caching can cut costs by up to 90% on repeated context. These are your biggest cost levers after model selection.
The right model for your use case is usually not the most expensive one. Start with Nova Micro or Claude Haiku and escalate only when quality demands it.
Use the Amazon Bedrock Pricing Calculator to estimate your specific monthly costs before building.

For broader cost management strategies, check out the AWS cost optimization checklist, explore AWS cost estimation tools, and read our AWS cost estimation guide for your complete infrastructure.

What's your experience with Bedrock pricing? Have you run into any unexpected costs? I'd love to hear about your setup in the comments below.