The Challenge

My client's AWS EKS infrastructure was running 10-15 ephemeral environments for pull request testing, each staying up 24/7 despite zero traffic during non-business hours. This resulted in significant costs with 50% idle capacity during off-peak hours and weekends.

The environments served a critical function—enabling developers to test changes in isolated, production-like environments—but the always-on approach was financially unsustainable.

The team needed a solution that would:

Reduce costs without impacting performance and behavior of the ephemeral environments
Scale dynamically (including scale-to-zero) with traffic patterns
"Wake up" ephemeral environments once traffic to them was detected

Existing infrastructure:

Node autoscaling: Karpenter for dynamic EC2 provisioning
Pod autoscaling: HPA based on CPU/memory metrics
Service mesh: Istio with gateway traffic metrics exposed to Prometheus
GitOps: Automated ephemeral environment provisioning per PR

The missing piece: HPA couldn't scale to zero, and we needed external metrics (HTTP requests) as a scaling trigger.

The Solution

Identifying the Solution

After researching scale-to-zero solutions, I proposed a proof-of-concept around KEDA (Kubernetes Event-Driven Autoscaling), which extends HPA to support:

External metrics (Prometheus, Datadog, Kafka, PostgreSQL, and 50+ others)
Scale-to-zero (impossible with native HPA)
Custom scaling behavior (cooldown periods, scaling modifiers)

Why KEDA fit perfectly:

We already exposed Istio gateway metrics to Prometheus
Our ephemeral environments were event-driven (triggered by HTTP requests)
KEDA's Prometheus scaler could monitor HTTP request rates and scale pods accordingly
Supports scale-to-zero with configurable "wake-up" on first request

Implementation & Validation

To validate the POC, I followed a systematic rollout plan:

Phase 1: Development Environment Setup

Deployed KEDA via Helm chart to dev cluster
Modified Helm charts to replace HPA with KEDA ScaledObjects
- Configured Prometheus scaler targeting Istio gateway metrics
- Set scale-to-zero with 15-minute idle timeout
Updated GitOps framework to support KEDA configuration in PR workflows

Phase 2: Testing the Sleep/Wake Cycle

Spun up test environment from PR
Monitored scale-to-zero after 15 minutes of inactivity
Triggered wake-up by sending HTTP request to environment URL
Verified cold-start behavior: Pods scaled from 0 → 1 within 2 minutes (worst case scenario for heavy Ruby application)
Ran automated test suite to ensure zero functional regression

Phase 3: Developer Experience Enhancement

Added wake-environment command to internal CLI tool for automatic environment activation
This eliminated manual curl requests and improved UX

Phase 4: Rollout

After successful validation:

✅ Deployed KEDA to staging/testing environment
✅ Merged infrastructure PRs
✅ Documented new workflow and communicated changes to engineering team
✅ Monitored cost metrics for 2 weeks to confirm savings

Results & Impact

Within 2 months of rollout:

💰 $36K annual savings
📊 50% cost reduction for development and testing environments specifically
📉 Off-hours resource utilization: 50% idle → near-zero
⚡ Near-Zero performance impact: Cold-start adds ~30s - 2 minutes (acceptable for dev/test)
🚀 Faster feedback: Developers still get isolated environments per PR
🎯 100% GitOps: All changes tracked in version control

Technical Achievements

Scale-to-zero working reliably across 10-15 concurrent environments
KEDA ScaledObjects replacing ~50 HPA definitions
Prometheus integration using existing Istio metrics (no new infrastructure)
Automated wake-up via CLI tool improved developer experience
Full rollback capability (GitOps-based, revert = one merge)

Lessons Learned

On POC Methodology

Thorough validation is critical — Testing sleep/wake cycles in dev before rollout prevented surprises in other SDLC environments
Developer experience matters — Adding the CLI wake-up feature eliminated friction and improved adoption
Monitor, don't assume — Watched cost metrics for 2 weeks post-rollout to confirm projected savings

On KEDA & Scale-to-Zero

KEDA > HPA for event-driven workloads — External metrics and scale-to-zero made it perfect for ephemeral environments
Cold-start trade-offs are acceptable in dev/test — 30-second wake-up was fine; wouldn't work for production
Existing metrics are gold — Leveraging Istio metrics meant zero new infrastructure

On Cost Optimization

Idle time = opportunity — 50% idle capacity was low-hanging fruit
Right-size the solution — Could have used Lambda or Fargate, but KEDA kept us in Kubernetes (simpler for team)
Small wins compound — $36K/year from one POC; similar optimizations in other areas could save 6 figures

Technologies Used

Infrastructure:

AWS (EKS, EC2)
Kubernetes
Karpenter (node autoscaling)
Terraform (IaC)

Autoscaling:

KEDA (Kubernetes Event-Driven Autoscaling)
Horizontal Pod Autoscaler (HPA) - replaced

Observability:

Prometheus (metrics)
Istio (service mesh, gateway metrics)

Developer Experience:

GitOps (automated PR environments)
Internal CLI tool (environment management)

Next Project: ML Inference Platform for Financial Services →

Kubernetes Cost Optimization for Ephemeral Environments

The Challenge

The Solution

Identifying the Solution

Implementation & Validation

Phase 1: Development Environment Setup

Phase 2: Testing the Sleep/Wake Cycle

Phase 3: Developer Experience Enhancement

Phase 4: Rollout

Results & Impact

Technical Achievements

Lessons Learned

On POC Methodology

On KEDA & Scale-to-Zero

On Cost Optimization

Technologies Used

Technologies Used