Cost Optimization
Master Kubernetes FinOps with modern tools, strategies, and best practices for reducing infrastructure costs without sacrificing performance.
The Cost Challenge
Kubernetes infrastructure costs can spiral quickly without proper optimization. Common issues include: over-provisioned resources (70-80% waste typical), underutilized nodes, expensive persistent storage, and egress bandwidth charges. Modern FinOps practices combine automation, visibility, and intelligent workload placement to reduce costs by 40-60%.
Cost Monitoring & Visibility (2025)
OpenCost (CNCF Sandbox)
Open-source Kubernetes cost monitoring with real-time allocation
Key Features:
- • Real-time cost allocation by namespace, pod, label
- • Multi-cloud support (AWS, GCP, Azure)
- • Integration with Prometheus and cloud billing APIs
- • Cost forecasting and budget alerts
- • 100% free and open-source
Best for: Organizations wanting full control and transparency
Kubecost (Commercial + Free Tier)
Enterprise-grade cost management with optimization recommendations
Key Features:
- • Detailed cost breakdown by cluster, namespace, deployment
- • Rightsizing recommendations with savings estimates
- • Unified multi-cluster cost visibility
- • Container cost allocation and chargeback
- • Free tier for single cluster
Best for: Enterprises with multi-cluster deployments
Cloud Provider Native Tools
Built-in cost management from cloud providers
Key Features:
- • AWS Cost Explorer with EKS filtering
- • GCP Cloud Billing with GKE labels
- • Azure Cost Management for AKS
- • Native integration with cloud services
- • Detailed billing reports and forecasting
Best for: Single-cloud deployments with existing cloud investments
FinOps Foundation Tools
FOCUS-compliant tools for standardized cost reporting
Key Features:
- • Standardized FinOps Open Cost & Usage Specification (FOCUS)
- • Cloud cost normalization across providers
- • Team attribution and showback/chargeback
- • Cost anomaly detection
- • Integration with ITFM/ITSM systems
Best for: Large organizations with FinOps teams
Intelligent Node Management
Karpenter: Next-Gen Autoscaling
Karpenter (CNCF Incubating) provides intelligent, fast node provisioning that dramatically reduces costs compared to traditional Cluster Autoscaler.
Key Benefits
- • Provisions nodes in seconds vs minutes
- • Bin-packing optimization for cost efficiency
- • Automatic instance type selection
- • Consolidation to reduce node count
- • Native support for spot/preemptible instances
Cost Savings
- • 40-60% reduction vs on-demand only
- • Automatic rightsizing of node types
- • Intelligent spot instance usage
- • Reduced over-provisioning
- • Lower data transfer costs via topology
Example Karpenter NodePool:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
template:
spec:
requirements:
- key: "karpenter.sh/capacity-type"
operator: In
values: ["spot", "on-demand"]
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["c6i.large", "c6i.xlarge", "c7i.large"]Spot/Preemptible Instances
60-90% discount
Use interruptible instances for fault-tolerant workloads
- • Mix with on-demand for stability
- • Use multiple instance types for diversity
- • Implement graceful shutdown handling
- • Perfect for batch jobs, CI/CD, dev/test
Savings Plans & Reserved Instances
30-70% discount
Commit to compute usage for predictable workloads
- • Analyze baseline compute needs first
- • Start with 1-year commitments
- • Use Compute Savings Plans for flexibility
- • Monitor utilization to maximize ROI
Resource Right-Sizing
Vertical Pod Autoscaler (VPA)
Automatically adjusts CPU and memory requests/limits based on actual usage patterns.
- Recommender mode provides actionable insights
- Auto mode updates resources automatically
- Reduces waste from over-provisioning
- Can reduce resource requests by 30-50%
Goldilocks (FairwindsOps)
Dashboard for VPA recommendations across all workloads with quality-of-service recommendations.
- Visualize VPA recommendations per namespace
- Compare current vs recommended settings
- Export recommendations as Kubernetes manifests
- Open-source and easy to deploy
Resource Requests & Limits Best Practices
✓ DO
- • Set requests based on actual P95 usage
- • Use limits only when necessary (avoid CPU limits often)
- • Monitor and adjust based on real data
- • Use QoS classes intentionally (Guaranteed, Burstable, BestEffort)
- • Set memory limits to prevent OOM kills
✗ DON'T
- • Copy-paste requests from examples
- • Set requests = limits without measuring
- • Ignore VPA recommendations
- • Over-provision "just to be safe"
- • Forget to set requests at all
Storage & Network Cost Reduction
Storage Optimization
Storage costs reduced by 40-70%- Use cheaper storage classes (gp3 vs io2, standard vs ssd)
- Implement PVC resize for growing workloads vs over-provisioning
- Delete unused PVCs (often forgotten after pod deletion)
- Use object storage (S3, GCS) for logs/backups vs block storage
- Implement storage lifecycle policies for automatic cleanup
- Compress data and use deduplication where possible
Network & Egress Optimization
Egress costs reduced by 50-80%- Use Private Links/PrivateLink for inter-service communication
- Minimize cross-region/cross-AZ traffic with topology-aware routing
- Implement caching layers (Redis, CDN) to reduce origin requests
- Use VPC endpoints for AWS services to avoid NAT gateway costs
- Compress data in transit (gzip, brotli)
- Regional data residency to avoid unnecessary data movement
FinOps Best Practices (2025)
Cost Governance Framework
Visibility
- • Real-time cost dashboards
- • Label all resources (team, env, app)
- • Chargeback/showback reports
- • Cost anomaly detection
Optimization
- • Automated rightsizing
- • Spot instance strategies
- • Resource quotas per namespace
- • Idle resource cleanup
Governance
- • Budget alerts and enforcement
- • Policy-driven automation
- • Regular cost reviews
- • FinOps team ownership
Quick Wins Checklist
Key Takeaways
- Use OpenCost or Kubecost for real-time visibility into Kubernetes spending
- Karpenter provides 40-60% cost savings through intelligent node management and spot instances
- Right-size resources with VPA and Goldilocks based on actual usage, not guesses
- Implement FinOps governance with budgets, labels, and automated policies
- Optimize storage and network costs—often overlooked but can be 30% of total spend