Back to Home

Kubernetes in Production

Scaling & Performance

Master horizontal and vertical scaling, event-driven autoscaling with KEDA, and performance optimization strategies.

Horizontal Pod Autoscaler (HPA)

HPA automatically scales pod replicas based on CPU, memory, or custom metrics. It's the most common autoscaling mechanism in Kubernetes.

Key Features

  • • Metrics from Metrics Server or custom APIs
  • • CPU, memory, and custom application metrics
  • • Configurable scale-up/down behavior
  • • Stabilization windows to prevent flapping
  • • Multiple metrics for smarter decisions

HPA v2 (Current)

  • • Multiple metric sources simultaneously
  • • Container-level resource metrics
  • • External metrics from APM tools
  • • Behavior configuration for scale velocity
  • • Better handling of custom metrics

Example: HPA with Multiple Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

KEDA: Event-Driven Autoscaling (2025 Standard)

What is KEDA? (CNCF Graduated)

Kubernetes Event-Driven Autoscaling (KEDA) extends HPA to scale based on external event sources like message queues, databases, and cloud services. It can scale to zero and supports 60+ scalers out of the box.

Why KEDA?

  • Scale to zero: Save costs when idle
  • Event-driven: React to business events
  • 60+ scalers: Kafka, RabbitMQ, Redis, AWS SQS, etc.
  • No code changes: External metrics only
  • Works with HPA: Extends native Kubernetes

Common Use Cases

  • • Background job processing (queues)
  • • Event processing pipelines
  • • Batch workloads triggered by data
  • • Stream processing (Kafka consumers)
  • • Scheduled scaling based on metrics
📨

Kafka Scaler

Scale based on Kafka topic lag or offset

Example: Scale consumers when lag exceeds threshold

🐰

RabbitMQ Scaler

Scale based on queue length or message rate

Example: Process messages faster during high traffic

☁️

AWS SQS Scaler

Scale based on approximate message count

Example: Auto-scale workers processing SQS messages

📊

Prometheus Scaler

Scale based on any Prometheus query

Example: Scale on custom business metrics

Cron Scaler

Scale based on time schedules

Example: Pre-scale for known traffic patterns

Example: KEDA Kafka ScaledObject

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
spec:
  scaleTargetRef:
    name: kafka-consumer
  minReplicaCount: 0  # Scale to zero when idle
  maxReplicaCount: 30
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka:9092
      consumerGroup: my-consumer-group
      topic: events-topic
      lagThreshold: "50"  # Scale when lag > 50 messages
      activationLagThreshold: "10"  # Wake from zero at 10 messages
  cooldownPeriod: 300  # Wait 5min before scaling down

Vertical Pod Autoscaler (VPA)

VPA adjusts CPU and memory requests/limits based on actual usage patterns, optimizing resource allocation.

VPA Modes

  • Off (Recommender):

    Only provides recommendations, doesn't modify pods

  • Initial:

    Sets requests on pod creation, doesn't update running pods

  • Auto:

    Automatically updates requests and restarts pods

  • Recreate:

    Evicts and recreates pods with new requests

⚠️ Important Constraints

  • Don't use VPA + HPA on same CPU/memory - Causes conflicts
  • Use VPA for vertical scaling, HPA for horizontal
  • Or use VPA in recommender mode with manual updates
  • VPA requires pod restart to apply changes

Node Autoscaling: Cluster Autoscaler vs Karpenter

Cluster Autoscaler (Traditional)

Standard Kubernetes node autoscaling, works with cloud provider node groups.

Pros:

  • • Stable, battle-tested
  • • Works with all cloud providers
  • • Simple configuration
  • • Good for basic use cases

Cons:

  • • Slower scaling (minutes)
  • • Limited instance type selection
  • • Node group constraints
  • • Less efficient bin-packing

Karpenter (Modern - 2025 Recommended)

Next-gen autoscaler with intelligent provisioning and consolidation. See Cost Optimization chapter for details.

Pros:

  • • Fast scaling (seconds)
  • • Intelligent instance selection
  • • Automatic consolidation
  • • Native spot instance support
  • • Better cost optimization (40-60%)

Considerations:

  • • AWS/Azure focus (GCP in beta)
  • • Requires cluster reconfiguration
  • • Different operational model

Recommendation (2025)

Use Karpenter for AWS/Azure production clusters - The cost savings and operational improvements justify the migration effort. Stick with Cluster Autoscaler only if you need multi-cloud parity or have specific constraints.

Karpenter is now production-ready (CNCF Incubating) and used by major organizations. It's the future of Kubernetes node autoscaling.

Performance Optimization Best Practices

Resource Management

  • Set accurate resource requests based on real usage
  • Use QoS classes intentionally (Guaranteed vs Burstable)
  • Avoid CPU limits for latency-sensitive apps
  • Monitor and tune garbage collection settings

Application Optimization

  • Implement health checks (liveness, readiness, startup)
  • Use connection pooling for databases
  • Implement caching layers (Redis, CDN)
  • Optimize container image size and layers

Key Takeaways

  • HPA handles horizontal scaling based on metrics; VPA optimizes resource requests
  • KEDA enables event-driven autoscaling with scale-to-zero for cost savings
  • Karpenter is the modern choice for node autoscaling with superior cost optimization
  • Accurate resource requests are critical for all autoscaling mechanisms
  • Combine multiple autoscaling approaches for complete automation