Back to Case Studies
Fintech Startup
How We Cut a Fintech's Cloud Bill by 75% Before Day 1
The Challenge
A pre-revenue Fintech startup received a proposal from a "Modern DevOps Agency." The architecture was "Portfolio-Driven Development" at its finest: a massive AWS EKS (Kubernetes) cluster, oversized node groups, and complex service meshes—all for an app with zero active users.
The projected cost was $20,000 / month, which would have drained their seed funding rapidly.
The Intervention
We performed a forensic architecture review and identified that the complexity was unnecessary for the "0 to 1" phase. We stripped out the EKS control plane overhead and migrated the workload to AWS ECS Fargate.
The Forensic Accounting
1. The "Bloated" Proposal (The Resume-Driven Stack)
| Cost Driver | Technical Justification (The Waste) | Monthly Cost (Est.) |
|---|---|---|
| Compute (Prod) | 6x m5.4xlarge (16 vCPU) across 3 AZs for HA. (Price: ~$0.768/hr x 730 hrs x 6 nodes) | ~$3,360 |
| Compute (Non-Prod) | Mirror of Prod (Dev + UAT) running 24/7. (Agencies rarely script "shut down" logic for EKS node groups) | ~$3,360 |
| EKS Control Plane | $0.10/hr per cluster x 3 Clusters (Dev, UAT, Prod). ($73/mo x 3) | ~$220 |
| Networking | 9 NAT Gateways (3 AZs x 3 Envs). ($0.045/hr x 9 GWs x 730 hrs + Data Proc) | ~$600+ |
| Observability | Datadog Enterprise (Host-based licensing). ($23/node x 12 nodes + Custom Metrics/Logs) | ~$2,500+ |
| Load Balancers | Dedicated ALBs per microservice (20 services). (Common anti-pattern: 1 ALB per Service) | ~$1,500 |
| Database | Provisioned Aurora (Writer + Reader). (Running 24/7 even when idle) | ~$1,200 |
| TOTAL: ~$12,740 - $20,000+ (Variance depends on Observability/data transfer) | ||
2. The "Optimized" Reality (The Velocity Stack)
| Cost Driver | Optimization Strategy | Monthly Cost (Verified) |
|---|---|---|
| Compute (Prod) | ECS Fargate. Right-sized tasks (1 vCPU). (No idle capacity; pay per second) | ~$800 |
| Compute (Non-Prod) | Fargate Spot Instances (70% discount). Auto-shutdown nights/weekends (160 hrs/mo usage). | ~$150 |
| Control Plane | ECS Control Plane is Free. | $0 |
| Networking | Shared NAT Gateway (1 per Env). (3 NATs total vs. 9) | ~$150 |
| Observability | CloudWatch Container Insights. (Optimized ingestion) | ~$500 |
| Load Balancers | Shared ALB (Path-based routing). (1 ALB per Environment) | ~$100 |
| Database | Aurora Serverless v2. (Scales to 0.5 ACU ($0.06/hr) when idle) | ~$800 |
| TOTAL: ~$2,500 - $4,800 | ||
The Architecture
Technical Implementation
We implemented aggressive auto-scaling policies and a serverless-first approach.
- Compute: AWS Fargate (Spot Instances for dev, On-Demand for Prod).
- IaC: Terraform (Modularized for rapid tear-down).
- Database: Aurora Serverless v2 (Pay only for active SQL cycles).
Key Results
New Monthly Bill< $4,800
Annual Savings~$182,000
Where did the savings come from?
- •The "Cluster Tax" ($3k/mo): We deleted the EKS Control Plane and oversized Node Groups. With ECS Fargate, you don't pay for the "Orchestrator," you only pay for the App.
- •The "Zombie Infrastructure" ($6k/mo): The previous proposal had Dev/UAT environments running on massive servers 24/7. We moved them to Fargate Spot Instances and automated them to sleep when developers sleep.
- •The "Network Bloat" ($2k/mo): They proposed 9 NAT Gateways (one for every Availability Zone in every environment). We consolidated this to 3 Shared NATs.
- •The "License Trap" ($4k/mo): We replaced host-based licensing (Datadog) with event-based logging (CloudWatch), removing the penalty for scaling out.
"If your Day 1 architecture costs more than your first 5 employees, you don't have a scaling strategy. You have a spending problem."