Scaling & Performance
Master horizontal and vertical scaling, event-driven autoscaling with KEDA, and performance optimization strategies.
Horizontal Pod Autoscaler (HPA)
HPA automatically scales pod replicas based on CPU, memory, or custom metrics. It's the most common autoscaling mechanism in Kubernetes.
Key Features
- • Metrics from Metrics Server or custom APIs
- • CPU, memory, and custom application metrics
- • Configurable scale-up/down behavior
- • Stabilization windows to prevent flapping
- • Multiple metrics for smarter decisions
HPA v2 (Current)
- • Multiple metric sources simultaneously
- • Container-level resource metrics
- • External metrics from APM tools
- • Behavior configuration for scale velocity
- • Better handling of custom metrics
Example: HPA with Multiple Metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60KEDA: Event-Driven Autoscaling (2025 Standard)
What is KEDA? (CNCF Graduated)
Kubernetes Event-Driven Autoscaling (KEDA) extends HPA to scale based on external event sources like message queues, databases, and cloud services. It can scale to zero and supports 60+ scalers out of the box.
Why KEDA?
- • Scale to zero: Save costs when idle
- • Event-driven: React to business events
- • 60+ scalers: Kafka, RabbitMQ, Redis, AWS SQS, etc.
- • No code changes: External metrics only
- • Works with HPA: Extends native Kubernetes
Common Use Cases
- • Background job processing (queues)
- • Event processing pipelines
- • Batch workloads triggered by data
- • Stream processing (Kafka consumers)
- • Scheduled scaling based on metrics
Kafka Scaler
Scale based on Kafka topic lag or offset
Example: Scale consumers when lag exceeds threshold
RabbitMQ Scaler
Scale based on queue length or message rate
Example: Process messages faster during high traffic
AWS SQS Scaler
Scale based on approximate message count
Example: Auto-scale workers processing SQS messages
Prometheus Scaler
Scale based on any Prometheus query
Example: Scale on custom business metrics
Cron Scaler
Scale based on time schedules
Example: Pre-scale for known traffic patterns
Example: KEDA Kafka ScaledObject
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer-scaler
spec:
scaleTargetRef:
name: kafka-consumer
minReplicaCount: 0 # Scale to zero when idle
maxReplicaCount: 30
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: my-consumer-group
topic: events-topic
lagThreshold: "50" # Scale when lag > 50 messages
activationLagThreshold: "10" # Wake from zero at 10 messages
cooldownPeriod: 300 # Wait 5min before scaling downVertical Pod Autoscaler (VPA)
VPA adjusts CPU and memory requests/limits based on actual usage patterns, optimizing resource allocation.
VPA Modes
- Off (Recommender):
Only provides recommendations, doesn't modify pods
- Initial:
Sets requests on pod creation, doesn't update running pods
- Auto:
Automatically updates requests and restarts pods
- Recreate:
Evicts and recreates pods with new requests
⚠️ Important Constraints
- ✗Don't use VPA + HPA on same CPU/memory - Causes conflicts
- Use VPA for vertical scaling, HPA for horizontal
- Or use VPA in recommender mode with manual updates
- VPA requires pod restart to apply changes
Node Autoscaling: Cluster Autoscaler vs Karpenter
Cluster Autoscaler (Traditional)
Standard Kubernetes node autoscaling, works with cloud provider node groups.
Pros:
- • Stable, battle-tested
- • Works with all cloud providers
- • Simple configuration
- • Good for basic use cases
Cons:
- • Slower scaling (minutes)
- • Limited instance type selection
- • Node group constraints
- • Less efficient bin-packing
Karpenter (Modern - 2025 Recommended)
Next-gen autoscaler with intelligent provisioning and consolidation. See Cost Optimization chapter for details.
Pros:
- • Fast scaling (seconds)
- • Intelligent instance selection
- • Automatic consolidation
- • Native spot instance support
- • Better cost optimization (40-60%)
Considerations:
- • AWS/Azure focus (GCP in beta)
- • Requires cluster reconfiguration
- • Different operational model
Recommendation (2025)
Use Karpenter for AWS/Azure production clusters - The cost savings and operational improvements justify the migration effort. Stick with Cluster Autoscaler only if you need multi-cloud parity or have specific constraints.
Karpenter is now production-ready (CNCF Incubating) and used by major organizations. It's the future of Kubernetes node autoscaling.
Performance Optimization Best Practices
Resource Management
- Set accurate resource requests based on real usage
- Use QoS classes intentionally (Guaranteed vs Burstable)
- Avoid CPU limits for latency-sensitive apps
- Monitor and tune garbage collection settings
Application Optimization
- Implement health checks (liveness, readiness, startup)
- Use connection pooling for databases
- Implement caching layers (Redis, CDN)
- Optimize container image size and layers
Key Takeaways
- HPA handles horizontal scaling based on metrics; VPA optimizes resource requests
- KEDA enables event-driven autoscaling with scale-to-zero for cost savings
- Karpenter is the modern choice for node autoscaling with superior cost optimization
- Accurate resource requests are critical for all autoscaling mechanisms
- Combine multiple autoscaling approaches for complete automation