Cost Optimization

Master Kubernetes FinOps with modern tools, strategies, and best practices for reducing infrastructure costs without sacrificing performance.

The Cost Challenge

Kubernetes infrastructure costs can spiral quickly without proper optimization. Common issues include: over-provisioned resources (70-80% waste typical), underutilized nodes, expensive persistent storage, and egress bandwidth charges. Modern FinOps practices combine automation, visibility, and intelligent workload placement to reduce costs by 40-60%.

Cost Monitoring & Visibility (2025)

🆓

OpenCost (CNCF Sandbox)

Open-source Kubernetes cost monitoring with real-time allocation

Key Features:

• Real-time cost allocation by namespace, pod, label
• Multi-cloud support (AWS, GCP, Azure)
• Integration with Prometheus and cloud billing APIs
• Cost forecasting and budget alerts
• 100% free and open-source

Best for: Organizations wanting full control and transparency

💼

Kubecost (Commercial + Free Tier)

Enterprise-grade cost management with optimization recommendations

Key Features:

• Detailed cost breakdown by cluster, namespace, deployment
• Rightsizing recommendations with savings estimates
• Unified multi-cluster cost visibility
• Container cost allocation and chargeback
• Free tier for single cluster

Best for: Enterprises with multi-cluster deployments

☁️

Cloud Provider Native Tools

Built-in cost management from cloud providers

Key Features:

• AWS Cost Explorer with EKS filtering
• GCP Cloud Billing with GKE labels
• Azure Cost Management for AKS
• Native integration with cloud services
• Detailed billing reports and forecasting

Best for: Single-cloud deployments with existing cloud investments

📊

FinOps Foundation Tools

FOCUS-compliant tools for standardized cost reporting

Key Features:

• Standardized FinOps Open Cost & Usage Specification (FOCUS)
• Cloud cost normalization across providers
• Team attribution and showback/chargeback
• Cost anomaly detection
• Integration with ITFM/ITSM systems

Best for: Large organizations with FinOps teams

Intelligent Node Management

Karpenter: Next-Gen Autoscaling

Karpenter (CNCF Incubating) provides intelligent, fast node provisioning that dramatically reduces costs compared to traditional Cluster Autoscaler.

Key Benefits

• Provisions nodes in seconds vs minutes
• Bin-packing optimization for cost efficiency
• Automatic instance type selection
• Consolidation to reduce node count
• Native support for spot/preemptible instances

Cost Savings

• 40-60% reduction vs on-demand only
• Automatic rightsizing of node types
• Intelligent spot instance usage
• Reduced over-provisioning
• Lower data transfer costs via topology

Example Karpenter NodePool:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
  template:
    spec:
      requirements:
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["spot", "on-demand"]
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["c6i.large", "c6i.xlarge", "c7i.large"]

⚡

Spot/Preemptible Instances

60-90% discount

Use interruptible instances for fault-tolerant workloads

• Mix with on-demand for stability
• Use multiple instance types for diversity
• Implement graceful shutdown handling
• Perfect for batch jobs, CI/CD, dev/test

🎯

Savings Plans & Reserved Instances

30-70% discount

Commit to compute usage for predictable workloads

• Analyze baseline compute needs first
• Start with 1-year commitments
• Use Compute Savings Plans for flexibility
• Monitor utilization to maximize ROI

Resource Right-Sizing

Vertical Pod Autoscaler (VPA)

Automatically adjusts CPU and memory requests/limits based on actual usage patterns.

Recommender mode provides actionable insights
Auto mode updates resources automatically
Reduces waste from over-provisioning
Can reduce resource requests by 30-50%

Goldilocks (FairwindsOps)

Dashboard for VPA recommendations across all workloads with quality-of-service recommendations.

Visualize VPA recommendations per namespace
Compare current vs recommended settings
Export recommendations as Kubernetes manifests
Open-source and easy to deploy

Resource Requests & Limits Best Practices

✓ DO

• Set requests based on actual P95 usage
• Use limits only when necessary (avoid CPU limits often)
• Monitor and adjust based on real data
• Use QoS classes intentionally (Guaranteed, Burstable, BestEffort)
• Set memory limits to prevent OOM kills

✗ DON'T

• Copy-paste requests from examples
• Set requests = limits without measuring
• Ignore VPA recommendations
• Over-provision "just to be safe"
• Forget to set requests at all

Storage & Network Cost Reduction

Storage Optimization

Storage costs reduced by 40-70%

Use cheaper storage classes (gp3 vs io2, standard vs ssd)
Implement PVC resize for growing workloads vs over-provisioning
Delete unused PVCs (often forgotten after pod deletion)
Use object storage (S3, GCS) for logs/backups vs block storage
Implement storage lifecycle policies for automatic cleanup
Compress data and use deduplication where possible

Network & Egress Optimization

Egress costs reduced by 50-80%

Use Private Links/PrivateLink for inter-service communication
Minimize cross-region/cross-AZ traffic with topology-aware routing
Implement caching layers (Redis, CDN) to reduce origin requests
Use VPC endpoints for AWS services to avoid NAT gateway costs
Compress data in transit (gzip, brotli)
Regional data residency to avoid unnecessary data movement

FinOps Best Practices (2025)

Cost Governance Framework

Visibility

• Real-time cost dashboards
• Label all resources (team, env, app)
• Chargeback/showback reports
• Cost anomaly detection

Optimization

• Automated rightsizing
• Spot instance strategies
• Resource quotas per namespace
• Idle resource cleanup

Governance

• Budget alerts and enforcement
• Policy-driven automation
• Regular cost reviews
• FinOps team ownership

Quick Wins Checklist

Delete unused LoadBalancers and EBS volumes

Implement HPA to scale down during off-hours

Use spot instances for dev/test environments (90% savings)

Switch to gp3 volumes (20% cheaper than gp2)

Implement Karpenter for automatic node optimization

Set up cost allocation labels on all resources

Review and right-size pod resource requests

Clean up old container images in registries

Key Takeaways

Use OpenCost or Kubecost for real-time visibility into Kubernetes spending
Karpenter provides 40-60% cost savings through intelligent node management and spot instances
Right-size resources with VPA and Goldilocks based on actual usage, not guesses
Implement FinOps governance with budgets, labels, and automated policies
Optimize storage and network costs—often overlooked but can be 30% of total spend

Previous: Production Patterns Back to Guide