Back to Home

Kubernetes in Production

Cost Optimization

Master Kubernetes FinOps with modern tools, strategies, and best practices for reducing infrastructure costs without sacrificing performance.

The Cost Challenge

Kubernetes infrastructure costs can spiral quickly without proper optimization. Common issues include: over-provisioned resources (70-80% waste typical), underutilized nodes, expensive persistent storage, and egress bandwidth charges. Modern FinOps practices combine automation, visibility, and intelligent workload placement to reduce costs by 40-60%.

Cost Monitoring & Visibility (2025)

🆓

OpenCost (CNCF Sandbox)

Open-source Kubernetes cost monitoring with real-time allocation

Key Features:

  • Real-time cost allocation by namespace, pod, label
  • Multi-cloud support (AWS, GCP, Azure)
  • Integration with Prometheus and cloud billing APIs
  • Cost forecasting and budget alerts
  • 100% free and open-source

Best for: Organizations wanting full control and transparency

💼

Kubecost (Commercial + Free Tier)

Enterprise-grade cost management with optimization recommendations

Key Features:

  • Detailed cost breakdown by cluster, namespace, deployment
  • Rightsizing recommendations with savings estimates
  • Unified multi-cluster cost visibility
  • Container cost allocation and chargeback
  • Free tier for single cluster

Best for: Enterprises with multi-cluster deployments

☁️

Cloud Provider Native Tools

Built-in cost management from cloud providers

Key Features:

  • AWS Cost Explorer with EKS filtering
  • GCP Cloud Billing with GKE labels
  • Azure Cost Management for AKS
  • Native integration with cloud services
  • Detailed billing reports and forecasting

Best for: Single-cloud deployments with existing cloud investments

📊

FinOps Foundation Tools

FOCUS-compliant tools for standardized cost reporting

Key Features:

  • Standardized FinOps Open Cost & Usage Specification (FOCUS)
  • Cloud cost normalization across providers
  • Team attribution and showback/chargeback
  • Cost anomaly detection
  • Integration with ITFM/ITSM systems

Best for: Large organizations with FinOps teams

Intelligent Node Management

Karpenter: Next-Gen Autoscaling

Karpenter (CNCF Incubating) provides intelligent, fast node provisioning that dramatically reduces costs compared to traditional Cluster Autoscaler.

Key Benefits

  • • Provisions nodes in seconds vs minutes
  • • Bin-packing optimization for cost efficiency
  • • Automatic instance type selection
  • • Consolidation to reduce node count
  • • Native support for spot/preemptible instances

Cost Savings

  • • 40-60% reduction vs on-demand only
  • • Automatic rightsizing of node types
  • • Intelligent spot instance usage
  • • Reduced over-provisioning
  • • Lower data transfer costs via topology

Example Karpenter NodePool:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
  template:
    spec:
      requirements:
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["spot", "on-demand"]
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["c6i.large", "c6i.xlarge", "c7i.large"]

Spot/Preemptible Instances

60-90% discount

Use interruptible instances for fault-tolerant workloads

  • Mix with on-demand for stability
  • Use multiple instance types for diversity
  • Implement graceful shutdown handling
  • Perfect for batch jobs, CI/CD, dev/test
🎯

Savings Plans & Reserved Instances

30-70% discount

Commit to compute usage for predictable workloads

  • Analyze baseline compute needs first
  • Start with 1-year commitments
  • Use Compute Savings Plans for flexibility
  • Monitor utilization to maximize ROI

Resource Right-Sizing

Vertical Pod Autoscaler (VPA)

Automatically adjusts CPU and memory requests/limits based on actual usage patterns.

  • Recommender mode provides actionable insights
  • Auto mode updates resources automatically
  • Reduces waste from over-provisioning
  • Can reduce resource requests by 30-50%

Goldilocks (FairwindsOps)

Dashboard for VPA recommendations across all workloads with quality-of-service recommendations.

  • Visualize VPA recommendations per namespace
  • Compare current vs recommended settings
  • Export recommendations as Kubernetes manifests
  • Open-source and easy to deploy

Resource Requests & Limits Best Practices

✓ DO

  • • Set requests based on actual P95 usage
  • • Use limits only when necessary (avoid CPU limits often)
  • • Monitor and adjust based on real data
  • • Use QoS classes intentionally (Guaranteed, Burstable, BestEffort)
  • • Set memory limits to prevent OOM kills

✗ DON'T

  • • Copy-paste requests from examples
  • • Set requests = limits without measuring
  • • Ignore VPA recommendations
  • • Over-provision "just to be safe"
  • • Forget to set requests at all

Storage & Network Cost Reduction

Storage Optimization

Storage costs reduced by 40-70%
  • Use cheaper storage classes (gp3 vs io2, standard vs ssd)
  • Implement PVC resize for growing workloads vs over-provisioning
  • Delete unused PVCs (often forgotten after pod deletion)
  • Use object storage (S3, GCS) for logs/backups vs block storage
  • Implement storage lifecycle policies for automatic cleanup
  • Compress data and use deduplication where possible

Network & Egress Optimization

Egress costs reduced by 50-80%
  • Use Private Links/PrivateLink for inter-service communication
  • Minimize cross-region/cross-AZ traffic with topology-aware routing
  • Implement caching layers (Redis, CDN) to reduce origin requests
  • Use VPC endpoints for AWS services to avoid NAT gateway costs
  • Compress data in transit (gzip, brotli)
  • Regional data residency to avoid unnecessary data movement

FinOps Best Practices (2025)

Cost Governance Framework

Visibility

  • • Real-time cost dashboards
  • • Label all resources (team, env, app)
  • • Chargeback/showback reports
  • • Cost anomaly detection

Optimization

  • • Automated rightsizing
  • • Spot instance strategies
  • • Resource quotas per namespace
  • • Idle resource cleanup

Governance

  • • Budget alerts and enforcement
  • • Policy-driven automation
  • • Regular cost reviews
  • • FinOps team ownership

Quick Wins Checklist

Delete unused LoadBalancers and EBS volumes
Implement HPA to scale down during off-hours
Use spot instances for dev/test environments (90% savings)
Switch to gp3 volumes (20% cheaper than gp2)
Implement Karpenter for automatic node optimization
Set up cost allocation labels on all resources
Review and right-size pod resource requests
Clean up old container images in registries

Key Takeaways

  • Use OpenCost or Kubecost for real-time visibility into Kubernetes spending
  • Karpenter provides 40-60% cost savings through intelligent node management and spot instances
  • Right-size resources with VPA and Goldilocks based on actual usage, not guesses
  • Implement FinOps governance with budgets, labels, and automated policies
  • Optimize storage and network costs—often overlooked but can be 30% of total spend