Kubernetes Cost Optimization for Ephemeral Environments
POC initiative that reduced infrastructure costs by $36K/year
Cost Reduction
$36K/year
Ephemeral Env Costs
-50% reduction
Cluster Resources
100%
50%
Performance
Zero degradation
The Challenge
My client's AWS EKS infrastructure was running 10-15 ephemeral environments for pull request testing, each staying up 24/7 despite zero traffic during non-business hours. This resulted in significant costs with 50% idle capacity during off-peak hours and weekends.
The environments served a critical function—enabling developers to test changes in isolated, production-like environments—but the always-on approach was financially unsustainable.
The team needed a solution that would:
- Reduce costs without impacting performance and behavior of the ephemeral environments
- Scale dynamically (including scale-to-zero) with traffic patterns
- "Wake up" ephemeral environments once traffic to them was detected
Existing infrastructure:
- Node autoscaling: Karpenter for dynamic EC2 provisioning
- Pod autoscaling: HPA based on CPU/memory metrics
- Service mesh: Istio with gateway traffic metrics exposed to Prometheus
- GitOps: Automated ephemeral environment provisioning per PR
The missing piece: HPA couldn't scale to zero, and we needed external metrics (HTTP requests) as a scaling trigger.
The Solution
Identifying the Solution
After researching scale-to-zero solutions, I proposed a proof-of-concept around KEDA (Kubernetes Event-Driven Autoscaling), which extends HPA to support:
- External metrics (Prometheus, Datadog, Kafka, PostgreSQL, and 50+ others)
- Scale-to-zero (impossible with native HPA)
- Custom scaling behavior (cooldown periods, scaling modifiers)
Why KEDA fit perfectly:
- We already exposed Istio gateway metrics to Prometheus
- Our ephemeral environments were event-driven (triggered by HTTP requests)
- KEDA's Prometheus scaler could monitor HTTP request rates and scale pods accordingly
- Supports scale-to-zero with configurable "wake-up" on first request
Implementation & Validation
To validate the POC, I followed a systematic rollout plan:
Phase 1: Development Environment Setup
- Deployed KEDA via Helm chart to dev cluster
- Modified Helm charts to replace HPA with KEDA ScaledObjects
- Configured Prometheus scaler targeting Istio gateway metrics
- Set scale-to-zero with 15-minute idle timeout
- Updated GitOps framework to support KEDA configuration in PR workflows
Phase 2: Testing the Sleep/Wake Cycle
- Spun up test environment from PR
- Monitored scale-to-zero after 15 minutes of inactivity
- Triggered wake-up by sending HTTP request to environment URL
- Verified cold-start behavior: Pods scaled from 0 → 1 within 2 minutes (worst case scenario for heavy Ruby application)
- Ran automated test suite to ensure zero functional regression
Phase 3: Developer Experience Enhancement
- Added
wake-environmentcommand to internal CLI tool for automatic environment activation - This eliminated manual curl requests and improved UX
Phase 4: Rollout
After successful validation:
- ✅ Deployed KEDA to staging/testing environment
- ✅ Merged infrastructure PRs
- ✅ Documented new workflow and communicated changes to engineering team
- ✅ Monitored cost metrics for 2 weeks to confirm savings
Results & Impact
Within 2 months of rollout:
- 💰 $36K annual savings
- 📊 50% cost reduction for development and testing environments specifically
- 📉 Off-hours resource utilization: 50% idle → near-zero
- ⚡ Near-Zero performance impact: Cold-start adds ~30s - 2 minutes (acceptable for dev/test)
- 🚀 Faster feedback: Developers still get isolated environments per PR
- 🎯 100% GitOps: All changes tracked in version control
Technical Achievements
- Scale-to-zero working reliably across 10-15 concurrent environments
- KEDA ScaledObjects replacing ~50 HPA definitions
- Prometheus integration using existing Istio metrics (no new infrastructure)
- Automated wake-up via CLI tool improved developer experience
- Full rollback capability (GitOps-based, revert = one merge)
Lessons Learned
On POC Methodology
- Thorough validation is critical — Testing sleep/wake cycles in dev before rollout prevented surprises in other SDLC environments
- Developer experience matters — Adding the CLI wake-up feature eliminated friction and improved adoption
- Monitor, don't assume — Watched cost metrics for 2 weeks post-rollout to confirm projected savings
On KEDA & Scale-to-Zero
- KEDA > HPA for event-driven workloads — External metrics and scale-to-zero made it perfect for ephemeral environments
- Cold-start trade-offs are acceptable in dev/test — 30-second wake-up was fine; wouldn't work for production
- Existing metrics are gold — Leveraging Istio metrics meant zero new infrastructure
On Cost Optimization
- Idle time = opportunity — 50% idle capacity was low-hanging fruit
- Right-size the solution — Could have used Lambda or Fargate, but KEDA kept us in Kubernetes (simpler for team)
- Small wins compound — $36K/year from one POC; similar optimizations in other areas could save 6 figures
Technologies Used
Infrastructure:
- AWS (EKS, EC2)
- Kubernetes
- Karpenter (node autoscaling)
- Terraform (IaC)
Autoscaling:
- KEDA (Kubernetes Event-Driven Autoscaling)
- Horizontal Pod Autoscaler (HPA) - replaced
Observability:
- Prometheus (metrics)
- Istio (service mesh, gateway metrics)
Developer Experience:
- GitOps (automated PR environments)
- Internal CLI tool (environment management)
Next Project: ML Inference Platform for Financial Services →