FinOps
FinOps: where your AWS costs are really hiding
The most underestimated AWS cost lines for SMBs — data transfer, NAT Gateway, CloudWatch, orphan resources. With CLI commands to hunt them down.
When you look at an AWS bill for the first time, the reflex is to attack EC2 and RDS. They’re visible, big, well-known. The problem: between 30 and 50 % of an SMB’s AWS bill hides in side lines — invisible in monthly reviews, never discussed in architecture reviews, and yet largely eliminable.
This article inventories the lines we systematically find on Distribuée engagements, and gives the AWS CLI commands to hunt them.
The actual breakdown of a bill
Across ~150 cumulative audits on SMBs between $5K and $100K/month, here’s the typical breakdown we observe:
The orange lines (data transfer, NAT, CloudWatch, orphan resources) make up a third of the bill. That’s where you recover the most, the fastest.
Line 1 — Data transfer, the invisible champion
Data transfer bills every time a byte leaves an AZ, a VPC, a region, or the Internet. Most architectures have never been optimized on this axis because nothing visibly breaks.
The most common paths to fix:
EC2 → S3 without VPC endpoint: routes through NAT Gateway → Internet → S3. Billed twice (NAT + egress). With a Gateway Endpoint, free.
EC2 → RDS cross-AZ: $0.01/GB outbound, $0.01/GB inbound. On a chatty SQL app, it’s massive. Fix: Multi-AZ for resilience, but read preferentially from the same-AZ replica.
Inter-region: $0.02/GB. Often a leftover from multi-region setups that no longer make sense. Simple question: do you really need that secondary region?
To map your situation:
# Top data transfer costs last month, by usage type
aws ce get-cost-and-usage \
--time-period Start=$(date -v-1m +%Y-%m-01),End=$(date +%Y-%m-01) \
--granularity MONTHLY \
--metrics UnblendedCost \
--group-by Type=DIMENSION,Key=USAGE_TYPE \
--filter '{"Dimensions":{"Key":"USAGE_TYPE_GROUP","Values":["EC2: Data Transfer"]}}' \
--query 'ResultsByTime[].Groups[?Metrics.UnblendedCost.Amount>`50`].[Keys[0],Metrics.UnblendedCost.Amount]' \
--output table
Line 2 — NAT Gateway, the silent killer
A NAT Gateway costs $0.045/hr baseline + $0.045/GB processed. On a VPC pushing 1 TB/month to the Internet, that’s $45 + $32 (hours) = ~$77/month per NAT, per AZ.
With 3 AZs × 3 environments × 2 regions, it climbs to several thousand dollars/year for nothing.
Fixes by yield:
- VPC Gateway Endpoints for S3 and DynamoDB (free): eliminate ~40 % of typical NAT traffic
- VPC Interface Endpoints for AWS services (~$7/month per endpoint, often worth it): ECR, Secrets Manager, SSM, STS
- Single NAT per environment when cross-AZ resilience isn’t critical
- NAT Instances for non-prod (~$5/month)
# List NAT Gateways and their estimated monthly cost
aws ec2 describe-nat-gateways \
--filter "Name=state,Values=available" \
--query 'NatGateways[].[NatGatewayId,SubnetId,Tags[?Key==`Name`].Value | [0]]' \
--output table
Line 3 — CloudWatch Logs, the silent debt
CloudWatch bills on three axes:
- Ingestion: $0.50/GB
- Storage: $0.03/GB/month
- Insights queries: $0.005/GB scanned
On verbose workloads (Lambda in debug, ALB access logs without retention), we routinely see $200–$800/month in CloudWatch — most of it logs nobody ever reads.
Fixes:
- Default retention, not “Never expire”: 30 days for prod, 7 for staging, 1 for dev
- Lambda ingestion filters: drop DEBUG in prod
- Subscription filter to S3 for long-term logs: 25× cheaper than CloudWatch
- Athena over S3 logs instead of Logs Insights for ad-hoc analyses
# Find log groups without retention configured
aws logs describe-log-groups \
--query 'logGroups[?retentionInDays==`null`].[logGroupName,storedBytes]' \
--output table
# Set 30-day retention on all log groups under a prefix
aws logs describe-log-groups \
--log-group-name-prefix "/aws/lambda/" \
--query 'logGroups[?retentionInDays==`null`].logGroupName' \
--output text | xargs -n1 -I{} aws logs put-retention-policy \
--log-group-name {} --retention-in-days 30
Line 4 — Orphan resources
The “ghosts” that cost a little each but accumulate. On an account that hasn’t been cleaned in 2 years, we routinely recover $200–$600/month.
Orphan EBS snapshots
Each snapshot costs $0.05/GB/month. Snapshots of long-deleted volumes are the #1 source of waste.
# Snapshots whose source volume no longer exists
aws ec2 describe-snapshots --owner-ids self \
--query 'Snapshots[?VolumeSize>`0`].[SnapshotId,VolumeId,VolumeSize,StartTime]' \
--output text | while read snap_id vol_id size date; do
if ! aws ec2 describe-volumes --volume-ids "$vol_id" >/dev/null 2>&1; then
echo "ORPHAN: $snap_id ($size GB, created $date)"
fi
done
Unattached Elastic IPs
$0.005/hr, or $3.60/month per idle EIP. On an account with 30 ghost EIPs, that’s $100/month.
aws ec2 describe-addresses \
--query 'Addresses[?AssociationId==`null`].[PublicIp,AllocationId]' \
--output table
Empty Load Balancers
An ALB runs at ~$18/month minimum whether it has traffic or not. When you decommission an environment and forget the LB…
# ALBs without healthy target group
aws elbv2 describe-load-balancers \
--query 'LoadBalancers[].[LoadBalancerArn,LoadBalancerName,State.Code]' \
--output text | while read arn name state; do
targets=$(aws elbv2 describe-target-groups \
--load-balancer-arn "$arn" \
--query 'length(TargetGroups)' --output text)
if [ "$targets" = "0" ]; then
echo "EMPTY ALB: $name"
fi
done
Detached EBS volumes
A detached EBS volume costs the same as an attached one. The classic oversight after a migration.
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[].[VolumeId,Size,VolumeType,CreateTime]' \
--output table
Line 5 — Mis-sized databases
Not really “hidden”, but often ignored. RDS and OpenSearch aren’t covered by Compute Optimizer in every configuration. Over-provisioning is common.
Questions to ask:
- Is prod running Multi-AZ when the workload isn’t critical?
- Is RDS storage on GP3 (40 % cheaper than GP2)?
- Are manual snapshots older than 12 months?
- Is automated backup retention set to 35 days when 7 would do?
The monthly hunting drill
Everything above must become a routine, not a one-off audit. Our monthly hunting script, run automatically (Lambda + Slack):
| Check | Frequency | Alert threshold |
|---|---|---|
| EBS snapshots without parent volume | Weekly | > $50 accumulated |
| Unattached EIPs | Weekly | > 5 EIPs |
available EBS volumes | Weekly | > 100 GB total |
| ALBs without healthy targets | Monthly | > 0 |
| Log groups without retention | Monthly | > 10 groups |
| NAT Gateway processing > X GB | Monthly | > 500 GB without S3 endpoint |
| EC2 over-provisioned (Compute Optimizer) | Monthly | savings > $100/month |
That’s day-to-day FinOps. Not one big yearly operation. A continuous, automated, alerted discipline.
Conclusion
Compute is visible, the rest is sneaky. On most SMB AWS accounts we audit, the “hidden” lines alone equal 1 to 3 junior engineers’ yearly salary. That money isn’t lost to AWS — it’s in your bill.
If you want a full bill audit, with a costed and prioritized 30/60/90-day action plan, that’s exactly what we do on the Architecture & DevSecOps Audit engagement.
Found this useful? Share it.
Go further
A topic, a project, a question?
Distribuée supports demanding SMBs on AWS audit, FinOps and security.
Book 15 min