ML workloads are expensive. A single ml.p4d.24xlarge training job can burn through thousands of dollars in hours. Yet most guides bury SageMaker Savings Plans in a few paragraphs within broader pricing discussions, leaving ML engineers without the specific guidance they need.
Here's what makes this guide different: I provide the first dedicated, ML-specific framework for calculating your SageMaker Savings Plans commitment. Whether you're training-heavy, inference-heavy, or running a mixed pipeline, you'll walk away with a concrete methodology for sizing your commitment based on your actual workload patterns, not generic advice that ignores how ML infrastructure actually behaves.
This is part of my comprehensive AWS Savings Plans guide where I cover all four Savings Plans types including Database Savings Plans for database workloads. Here, I go six times deeper on the SageMaker-specific considerations that most guides miss entirely.
TL;DR: SageMaker AI Savings Plans offer up to 64% savings on ML infrastructure in exchange for a 1-3 year $/hour commitment. The key is calculating your commitment based on consistent baseline usage, not your peak training jobs. Start conservative at 70-80% of your baseline, monitor quarterly, and add incrementally.
What Are SageMaker AI Savings Plans?
SageMaker AI Savings Plans use a dollar-per-hour commitment model, not instance-specific reservations. You commit to spending a consistent amount per hour on eligible SageMaker usage for either a 1-year or 3-year term. The commitment can range from $0.001 to $1,000,000 per hour.
What makes this different from EC2 Reserved Instances? You're committing to spend, not to specific instances. This means you can freely move between CPU and GPU instances, shift between regions, and change which SageMaker components you use without losing your discount.
Here's how it works in practice:
- Usage up to your commitment is charged at discounted Savings Plans rates
- Usage beyond your commitment is charged at regular On-Demand rates
- Unused commitment is still billed (use-it-or-lose-it within each hour)
The $/Hour Commitment Model
Think of your commitment as a spending floor, not a ceiling. If you commit to $10/hour:
- In an hour where you use $8 of SageMaker resources, you pay $8 at the Savings Plans rate. The remaining $2 of commitment is lost for that hour.
- In an hour where you use $15 of SageMaker resources, you pay $10 at the Savings Plans rate, and the remaining $5 at On-Demand rates.
This hourly independence is critical for ML workloads. Unlike web applications with steady traffic, training jobs create usage spikes followed by idle periods. Your commitment should match your consistent baseline, not your peak training hours.
Term options:
| Term | Discount Potential | Best For |
|---|---|---|
| 1-year | Lower savings | Evolving architectures, new projects |
| 3-year | Up to 64% | Stable, predictable production models |
Automatic Application Across Your ML Pipeline
The flexibility of SageMaker Savings Plans is their biggest advantage for ML workloads. A single commitment automatically applies across:
- Any instance family: ml.m5, ml.c5, ml.p4d, ml.g5, ml.trn1, ml.inf2
- Any instance size: xlarge, 2xlarge, 4xlarge, and beyond
- Any AWS Region: Shift from us-east-1 to eu-west-1 seamlessly
- Any SageMaker component: Notebooks, training, processing, inference
This means you can change from an ml.c5.xlarge CPU instance in US East (Ohio) to an ml.Inf1 inference chip in US West (Oregon) at any time and automatically continue paying the Savings Plans price. No reconfiguration required.
Now that you understand the commitment model, let's see exactly which SageMaker components your commitment covers.
Eligible SageMaker Components
Understanding what's covered and what's not is essential before calculating your commitment. Including non-covered components in your calculations will lead to an oversized commitment and wasted money.
The 7 Covered Components
SageMaker Savings Plans apply to seven components that form the core of most ML pipelines:
| Component | Description | Typical Use Case |
|---|---|---|
| SageMaker Studio Notebook | Managed Jupyter in Studio | Interactive development, EDA |
| SageMaker On-Demand Notebook | Classic notebook instances | Development, experimentation |
| SageMaker Processing | Data preprocessing jobs | Feature engineering, evaluation |
| SageMaker Data Wrangler | Visual data preparation | Data transformation, cleaning |
| SageMaker Training | Model training jobs | Training custom models |
| SageMaker Real-Time Inference | Persistent endpoints | Production inference |
| SageMaker Batch Transform | Batch prediction jobs | Offline scoring |
What's NOT Covered (And Why It Matters)
These components use different pricing models and cannot be included in your commitment calculation:
- SageMaker Serverless Inference: Pay-per-invocation model, not instance-based
- SageMaker HyperPod: Uses Training Plans, a separate pricing mechanism for large-scale foundation model training
- SageMaker Ground Truth: Labeling workforce costs
- SageMaker Feature Store: Storage and request-based pricing
- SageMaker MLflow: Separate pricing structure
- Dedicated Instances: The $2/hour dedicated fee is not discounted
Practical implication: If you're using Serverless Inference for development endpoints or Ground Truth for labeling, exclude those costs from your commitment calculation. They'll continue billing at their native rates regardless of your Savings Plans purchase.
Supported Instance Families
SageMaker Savings Plans cover the full range of ML instance types:
- General Purpose: ml.m4 through ml.m7i
- Compute Optimized: ml.c4 through ml.c7i
- Memory Optimized: ml.r5 through ml.r7i
- GPU Instances: ml.p2 through ml.p5en, ml.g4dn through ml.g6
- AWS Trainium: ml.trn1, ml.trn1n, ml.trn2
- AWS Inferentia: ml.inf1, ml.inf2
This includes the latest Trainium and Inferentia chips, which are increasingly popular for training and inference cost optimization. Your commitment remains valid even if you migrate from P4 GPUs to Trainium accelerators.
Discount Rates: What "Up to 64%" Actually Means
The "up to 64%" headline number requires context. That's the maximum discount with a 3-year All Upfront commitment on specific instance types. Your actual discount will vary.
Factors That Affect Your Discount
Four factors determine your actual savings percentage:
- Instance type and family: Different instance types have different Savings Plans rates
- Region: Pricing varies by AWS region
- Term length: 3-year terms offer higher discounts than 1-year
- Payment option: All Upfront > Partial Upfront > No Upfront
AWS doesn't publish a comprehensive discount table for every permutation. The 64% maximum is your ceiling, not your expectation. Most organizations see effective discounts in the 40-60% range depending on their instance mix.
Term Length Impact
The term length decision for ML workloads deserves careful consideration:
1-year terms make sense when:
- Your model architectures are evolving rapidly
- You're exploring new instance types (like Trainium or Inferentia)
- Your workload patterns haven't stabilized yet
- You're experimenting with different inference approaches
3-year terms make sense when:
- You have stable production models in steady state
- Your instance type preferences are established
- Maximum discount is the priority
- Your ML platform architecture is mature
My recommendation for ML workloads: Start with 1-year terms. ML evolves fast. New instance types (ml.trn2, ml.p5en) and architectures (foundation models, edge inference) can dramatically shift your infrastructure needs. The flexibility premium is often worth the slightly lower discount.
Payment Option Comparison
| Payment Option | How It Works | Discount Level | Cash Flow Impact |
|---|---|---|---|
| No Upfront | No upfront payment, billed monthly | Lowest | Best for cash flow |
| Partial Upfront | At least 50% upfront, remainder monthly | Middle | Balanced |
| All Upfront | Full payment at purchase | Highest (64%) | Requires capital |
For ML platform teams managing budget across fiscal years, No Upfront often makes sense despite the lower discount. It avoids large capital outlays and simplifies budget tracking.
Calculating Your Commitment (The ML-Specific Approach)
This is where most guides fail. Generic Savings Plans advice tells you to "analyze your usage and commit to your baseline." But ML workloads have unique characteristics that require a different approach.
Why ML Workloads Need Different Sizing
ML infrastructure behaves differently from typical web applications:
- Training jobs are bursty: A multi-hour training job spins up 8 ml.p4d instances, then nothing for days
- Inference can scale dramatically: Production endpoints auto-scale from 2 to 20 instances based on traffic
- Experimentation creates variability: Research teams explore new architectures with unpredictable usage
- GPU instances have high hourly rates: A single ml.p4d.24xlarge costs over $30/hour On-Demand, making mistakes expensive
Generic EC2 Savings Plans guidance doesn't account for these patterns. Committing based on a simple 30-day average will likely leave you either over-committed (paying for unused hours) or under-committed (missing savings on your actual baseline).
Training-Heavy vs Inference-Heavy Workloads
Your workload composition determines your commitment strategy:
Training-heavy workloads (batch processing, model development):
- Usage comes in bursts with significant idle time
- Commit based on average daily training hours, not peaks
- Consider Managed Spot Training for non-critical training (covered later)
- Your baseline is likely lower than it appears
Inference-heavy workloads (production endpoints):
- Usage is more consistent but still scales with traffic
- Commit based on minimum baseline endpoint capacity
- Account for auto-scaling (scale-up uses On-Demand beyond your commitment)
- More predictable than training, but don't commit to max capacity
Mixed workloads (most ML platforms):
- Prioritize the most consistent component for your commitment
- Often inference provides the steadiest baseline
- Leave training burst capacity for On-Demand or Spot
Step-by-Step Commitment Calculation Framework
Here's the methodology I recommend for ML workloads:
Step 1: Gather 60 days of SageMaker usage data
Open AWS Cost Explorer and filter by Service: Amazon SageMaker. Look at your daily spend over the past 60 days minimum. This captures weekly and monthly patterns that 7 or 30 days might miss.
Step 2: Identify your baseline daily spend
Remove outlier days (major training runs, one-time experiments). Calculate your average daily spend excluding the top 10% of days. This is your "realistic baseline," not inflated by occasional bursts.
Step 3: Calculate your hourly baseline
Hourly Baseline = (Average Daily Spend - Outliers) / 24
Step 4: Apply a conservative factor (70-80%)
For your initial commitment, multiply your hourly baseline by 0.70 to 0.80. This provides margin for usage variability and ensures you don't over-commit.
Initial Commitment = Hourly Baseline × 0.75
Step 5: Validate with AWS recommendations
Navigate to Billing Console > Savings Plans > Recommendations. Select SageMaker Savings Plans and compare AWS's recommendation against your calculation. If they differ significantly, investigate which usage patterns you might have missed.
Step 6: Plan for quarterly review
Don't treat your initial purchase as final. Schedule quarterly reviews to assess utilization and add incremental commitment as your usage patterns stabilize.
Real Example: Mixed ML Workload Sizing
Let me walk through a concrete scenario:
Team profile:
- Daily training jobs on ml.p4d.24xlarge (4 hours average)
- Production inference endpoints on ml.c5.2xlarge (24/7, 2-instance baseline)
- Processing jobs on ml.m5.4xlarge (2 hours average)
- Studio notebooks (sporadic)
60-day usage analysis:
- Average daily spend: $850
- Peak days (top 10%): $2,400 (large training runs)
- Baseline daily spend (excluding peaks): $720
Calculation:
Hourly Baseline = $720 / 24 = $30/hour
Conservative Commitment = $30 × 0.75 = $22.50/hour
Decision: Start with a $22/hour commitment on a 1-year No Upfront plan.
Expected outcome:
- Base coverage: $22/hour at Savings Plans rates (approximately 50% discount)
- Peak training days: Use On-Demand for burst capacity beyond commitment
- Annual savings: Approximately $48,000 compared to full On-Demand
Quarterly review: If utilization consistently exceeds 90%, add another $5-10/hour commitment.
SageMaker Savings Plans vs Managed Spot Training
Before finalizing your commitment, you should understand how Managed Spot Training fits into your cost optimization strategy. These aren't competing approaches. They're complementary.
When to Use Each
Savings Plans are better when:
- Training jobs have SLAs or time constraints
- You need guaranteed completion for production pipelines
- Workloads run on predictable schedules
- You have consistent inference endpoints
Managed Spot Training is better when:
- Jobs can tolerate interruptions (with checkpointing)
- You're running hyperparameter tuning experiments
- Cost matters more than completion time
- Workloads are fault-tolerant by design
The numbers tell the story:
| Approach | Maximum Savings | Reliability | Commitment |
|---|---|---|---|
| SageMaker Savings Plans | Up to 64% | Guaranteed | 1-3 years |
| Managed Spot Training | Up to 90% | Interruptible | None |
Can You Combine Them?
Yes, and you should. The optimal strategy uses both:
- Savings Plans cover your consistent baseline (inference endpoints, production training)
- Managed Spot Training handles burst and experimental workloads
Spot usage doesn't consume your Savings Plans commitment. They use a separate pricing mechanism. This means you can maximize savings across your entire ML workload by using each where it fits best.
Implementation tip: Enable checkpointing for any training job longer than 1 hour when using Spot. Set EnableManagedSpotTraining to True and configure MaxWaitTimeInSeconds greater than MaxRuntimeInSeconds to allow for spot interruption recovery.
Decision Framework
Use this framework to decide which approach applies to each workload:
Common Mistakes to Avoid
I've seen these mistakes repeatedly in ML teams adopting Savings Plans. Each one can cost thousands in wasted commitment or missed savings.
Committing Based on Training Peaks
The mistake: Your Cost Explorer shows $3,000 peak days from large training runs. You commit based on that peak.
The reality: Those peaks represent burst usage, not baseline. A 4-hour training job on 8 ml.p4d instances creates a massive spike, then nothing. Committing to peak means paying for 20 hours of unused commitment each day.
The fix: Calculate your average daily spend, exclude the top 10% of days, then convert to hourly. This captures your actual baseline.
Ignoring Spot for Fault-Tolerant Training
The mistake: You commit Savings Plans budget to all training workloads, including experimental and hyperparameter tuning jobs.
The reality: Managed Spot Training offers up to 90% savings for workloads that can handle interruptions. That's significantly better than the 64% maximum from Savings Plans.
The fix: Reserve your Savings Plans commitment for production training and inference. Use Spot for experiments, hyperparameter sweeps, and any training that can checkpoint.
Not Accounting for Inference Auto-Scaling
The mistake: Your inference endpoints auto-scale from 2 to 10 instances during traffic spikes. You commit based on your maximum capacity.
The reality: Auto-scaling means your baseline is the minimum instance count, not the maximum. Committing to max capacity wastes money during low-traffic hours.
The fix: Commit based on minimum baseline capacity. Let auto-scaled instances use On-Demand beyond your commitment. Also consider the scale-to-zero feature for development endpoints.
Over-Committing Before Workloads Stabilize
The mistake: A new ML project launches and immediately purchases a large Savings Plans commitment based on projected usage.
The reality: New projects have unpredictable usage patterns. Model architectures change. Instance needs evolve. Early projections are rarely accurate.
The fix: Wait for usage patterns to stabilize (60+ days of data minimum). Start conservative at 70-80% of observed baseline. Add incrementally after quarterly reviews.
Safety net: Commitments of $100/hour or less can be returned within 7 days if purchased in the same calendar month. Use this for smaller initial purchases while you validate your calculations.
When NOT to Use SageMaker Savings Plans
Savings Plans aren't always the right choice. Sometimes On-Demand or Spot is more cost-effective. Here's when to avoid or minimize your commitment.
Experimental/Research Workloads
If your team runs sporadic experiments driven by research timelines rather than production schedules, Savings Plans may not make sense:
- Usage is highly variable week to week
- Instance types change as you explore architectures
- There's no consistent baseline to commit against
Better alternatives:
- Managed Spot Training for experiments (up to 90% savings)
- Serverless Inference for development endpoints
- On-Demand for truly unpredictable workloads
Wait until workloads move to production before committing.
Short-Term Projects
Projects lasting less than 1 year can't benefit from Savings Plans:
- The minimum term is 1 year
- On-Demand may be cheaper than an underutilized commitment
- Factor project timeline into any commitment decision
Highly Variable Usage Patterns
If you can't identify a consistent baseline, don't commit:
- Usage that varies 50%+ week-to-week is too volatile
- Savings Plans require predictable hourly usage to be effective
- Gather more data before considering a commitment
The honest answer for variable workloads: On-Demand flexibility is worth more than forced savings on a commitment you can't fully use.
How to Purchase SageMaker Savings Plans
Once you've calculated your commitment, the purchase process is straightforward.
Using AWS Recommendations
AWS provides customized recommendations based on your historical usage:
- Open the Billing and Cost Management Console
- Navigate to Savings Plans > Recommendations
- Select SageMaker Savings Plans as the plan type
- Choose your term length (1-year or 3-year)
- Select your payment option (No Upfront, Partial, All Upfront)
- Review the recommended commitment based on your lookback period (7, 30, or 60 days)
The recommendation metrics include:
- Monthly On-Demand spend: What you'd pay without Savings Plans
- Estimated monthly spend: Projected spend with the recommendation
- Estimated monthly savings: Net savings from the purchase
Pro tip: Compare recommendations across all three lookback periods. The 60-day lookback usually provides the most accurate baseline for ML workloads.
Savings Plans Purchase Analyzer
For more control, use the Savings Plans Purchase Analyzer:
- Model custom commitment amounts before purchasing
- Estimate impact on cost, coverage, and utilization
- Compare multiple scenarios side-by-side
- Validate your manual calculations against AWS projections
Purchase via Console, CLI, or API
Console: Billing and Cost Management > Savings Plans > Purchase
AWS CLI:
# Find available SageMaker Savings Plan offerings
aws savingsplans describe-savings-plans-offerings \
--savings-plan-types "SageMaker" \
--durations 31536000 \
--payment-options "No Upfront"
# Purchase the Savings Plan
aws savingsplans create-savings-plan \
--savings-plan-offering-id "offering-id-from-above" \
--commitment "22.00" \
--tags Key=Environment,Value=Production Key=Team,Value=MLPlatform
Important limitation: Savings Plans cannot be purchased via CloudFormation or Terraform. The purchase must go through the console, CLI, or SDK. However, you can use IaC to create budgets and alerts for monitoring.
Monitoring and Optimizing Utilization
Purchasing is just the beginning. Ongoing monitoring ensures you get full value from your commitment.
Tracking SageMaker-Specific Utilization
Monitor your Savings Plans utilization in Cost Explorer:
- Navigate to Cost Explorer > Savings Plans > Utilization report
- Filter by Service: Amazon SageMaker
- Key metric: Used commitment vs. total commitment
Target: 80%+ utilization. Below 80% suggests you over-committed. Above 95% consistently suggests room for additional commitment.
Setting Up AWS Budgets for ML Workloads
Create utilization and coverage budgets to catch problems early:
# Terraform: Monitor Savings Plans utilization
resource "aws_budgets_budget" "sagemaker_sp_utilization" {
name = "sagemaker-savings-plans-utilization"
budget_type = "SAVINGS_PLANS_UTILIZATION"
limit_amount = "100"
limit_unit = "PERCENTAGE"
time_unit = "MONTHLY"
cost_filter {
name = "Service"
values = ["Amazon SageMaker"]
}
notification {
comparison_operator = "LESS_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = ["finops-team@example.com"]
}
}
# Terraform: Monitor coverage to identify expansion opportunities
resource "aws_budgets_budget" "sagemaker_sp_coverage" {
name = "sagemaker-savings-plans-coverage"
budget_type = "SAVINGS_PLANS_COVERAGE"
limit_amount = "80"
limit_unit = "PERCENTAGE"
time_unit = "MONTHLY"
cost_filter {
name = "Service"
values = ["Amazon SageMaker"]
}
notification {
comparison_operator = "LESS_THAN"
threshold = 70
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = ["finops-team@example.com"]
}
}
When to Add More Commitment
Add incremental commitment when:
- Utilization consistently exceeds 95%: You have room for more
- Coverage is below 70%: Significant On-Demand spend could be covered
- Usage patterns have stabilized: New workloads are now predictable
- Quarterly review confirms trend: Not just a temporary spike
Add incrementally (10-20% increases) rather than making large jumps. This allows you to course-correct if patterns change.
Multi-Account Considerations
For ML platform teams supporting multiple data science groups, multi-account Savings Plans management adds complexity.
Sharing Across AWS Organizations
By default, SageMaker Savings Plans benefits apply to all accounts within your AWS Organization. This is usually what ML platform teams want - centralized purchasing with organization-wide benefits.
Options for managing sharing:
- Shared (default): Benefits apply to all linked accounts
- Restricted: Benefits apply only to the purchasing account
- Cost Categories: Group accounts and purchase at group level
Application order within Organizations: Savings Plans apply first to the owner account's usage, then to other accounts if capacity remains.
Cost Allocation for ML Teams
For chargeback to business units:
- Enable cost allocation tags:
Project,Team,Environment,Owner - Use Cost Categories to group accounts by team or business unit
- Leverage the Cost and Usage Report's
SavingsPlanEffectiveCostcolumn for accurate chargeback
Important: Chargeback should be based on actual usage, not commitment allocation. The commitment is a centralized investment; the benefit is distributed by usage.
For more on multi-account strategies, see the AWS Organizations guide.
Conclusion
SageMaker AI Savings Plans are a powerful tool for ML cost optimization, but they require ML-specific thinking:
- Up to 64% savings with maximum flexibility across instance types, regions, and SageMaker components
- Calculate commitment based on consistent baseline, not training peaks or max inference capacity
- Combine Savings Plans with Managed Spot Training for maximum savings across your ML workload
- Start conservative at 70-80% of baseline and add incrementally after quarterly reviews
- Variable or experimental workloads may be better served by Spot Training or On-Demand
Your next step: Open Cost Explorer, analyze your last 60 days of SageMaker usage, identify your consistent baseline (excluding peak training days), and use AWS recommendations to validate your commitment amount. Start with a 1-year No Upfront plan at 75% of your calculated baseline.
For the complete picture of all Savings Plans types, including how SageMaker fits alongside Compute, EC2 Instance, and Database Savings Plans, read the comprehensive AWS Savings Plans guide.
See Infrastructure Costs in Code Review, Not on Your AWS Bill
CloudBurn automatically analyzes your Terraform and AWS CDK changes, showing cost estimates directly in pull requests. Catch expensive decisions during code review when they take seconds to fix, not weeks later in production.