⚠
Yesterday clean ✓ — May trajectory over cap, but an $11K AWS credit is pending.
Daily ✓: Saturday $1,513 (below baseline). Spike resolved. No new anomalies.
Monthly ⚠: May projecting ~$67K vs your $40K cap. Drivers: May 1 settle ($8K, recurring) + May 7-8 spike ($14K cost, $11K refund requested from AWS — human error, verbose logging) + intentional Neptune backfill (~$7.5K/mo above baseline) + ongoing c5.18xl experiment (~$5K/mo).
If AWS credit lands: May projection drops to ~$56K (still $16K over cap). If denied: ~$67K stands. Either way, June's lever question (see below) is what's actionable today.
May MTD: $31,580 (9d)
Forecast: $66,890
Credit pending: −$11,000?
Net forecast: ~$56–67K
Yesterday
$1,513
▼ −$92 vs $1,605 baseline ✓
May projected total
$56–67K
▲ +$16–27K over $40K cap (range = $11K AWS credit pending)
Current daily baseline
$1,605
elevated +$705/day from natural $900 base
Post-cleanup baseline target
$1,100
= ~$33K/mo · within $35-37K target ✓
What needs your attention today
1
May is a write-off — the only question is whether to clear the runway for June
Operating model is now set: $40K cap, $35–37K target, daily review. Against that: May lands at $56–67K depending on whether AWS grants the $11K credit for the May 7-8 human-error spike. Even with the credit, May is $16K over cap. None of this can be undone for May — drivers are all known and in flight below. The forward question is June:
Baseline math — why $1,605/day is not your natural rate
| Pre-ramp natural baseline (Apr 11) | $862/day · $26K/mo |
| + c5.18xlarge × 3 (alex experiment) | +$300/day · +$9K/mo |
| + Neptune backfill (intentional) | +$160/day · +$5K/mo |
| + CloudWatch from LoginFeatureExtraction | +$120/day · +$4K/mo |
| + general drift (S3, Lambda, small) | +$163/day · +$5K/mo |
| Current elevated baseline (May) | $1,605/day · $48K/mo |
| Post-cleanup target (Neptune + c5 + CW reverse) | ~$1,100/day · ~$33K/mo ✓ |
The instinct is right: baseline should return to $1,000–1,200/day once the in-flight ramps reverse. That puts steady-state monthly spend at $33–36K — squarely inside your $35–37K target, not over your cap.
To hit June cap, the two levers each move ~$5K/mo:
① Neptune downsize after backfill — saves ~$5K/mo (the +$160/day in the table above), reverts the +$300/day in EC2 elsewhere if Neptune-adjacent compute also scales down.
② 3× c5.18xlarge shutdown — saves ~$9K/mo. Owner alex (data team).
If both land before June 1 → June lands at ~$33K, well inside target with $7K headroom. If only Neptune lands → ~$42K (over cap). If only c5s land → ~$39K (just under). If neither → ~$48K. The brief tracks drift daily; June is the deadline this brief is built to defend.
Owner: you. Conversations needed: Kostya/Emile (Neptune end-date), alex (c5.18xl status). Both tracked in In-flight below — today's job is to put them on the calendar with explicit "before June 1" framing.
June deadline
Daily change log — yesterday vs day-before
Service Δ · May 9 vs May 8
| Service | May 9 | May 8 | Δ |
| AmazonCloudWatch | $35 | $5,633 | −$5,597 |
| AWS Lambda | $6 | $1,267 | −$1,261 |
| Amazon S3 | $32 | $801 | −$769 |
| Amazon DynamoDB | $9 | $37 | −$28 |
| EC2 — Other | $106 | $124 | −$18 |
| EC2 — Compute | $384 | $399 | −$15 |
| RDS | $115 | $120 | −$4 |
| Neptune | $310 | $314 | −$4 |
All "expected" drops = spike resolution. Nothing new moved up.
Account Δ · May 9 vs May 8
| Account | May 9 | May 8 | Δ |
| gh-stg + gh-prod | $496 | $8,170 | −$7,674 |
| shared-dev | $321 | $336 | −$15 |
| org / payer | $271 | $271 | $0 |
| ug-prod | $119 | $129 | −$10 |
| global | $50 | $57 | −$7 |
| za-prod | $44 | $42 | +$1 |
| za-stg | $26 | $27 | −$1 |
| ug-stg | $21 | $22 | −$1 |
gh-stg+prod is the only account that moved (resolution). All others are within ±$15.
Weekly change log — last 7d vs prior 7d
Service Δ · May 3–9 vs Apr 26–May 2
| Service | Last 7d | Prior 7d | Δ |
| AmazonCloudWatch | $8,499 | $383 | +$8,116 |
| AWS Lambda | $1,743 | $48 | +$1,694 |
| Amazon S3 | $1,338 | $242 | +$1,096 |
| EC2 — Compute | $2,594 | $1,601 | +$993 |
| Amazon Neptune | $2,054 | $1,525 | +$529 |
| SP for AWS ML | $535 | $173 | +$363 |
| EC2 — Other | $835 | $757 | +$78 |
| EKS control plane | $215 | $156 | +$59 |
| SP for AWS Compute | $995 | $995 | $0 |
| Amazon ElastiCache | $201 | $370 | −$169 |
| Amazon RDS | $868 | $1,447 | −$579 |
| Tax | $421 | $5,307 | −$4,886 |
Top 3 ↑ rows are spike-driven. Tax ↓ is the May 1 settle moving out of the prior-7d window.
Account Δ · May 3–9 vs Apr 26–May 2
| Account | Last 7d | Prior 7d | Δ |
| gh-stg + gh-prod | $14,337 | $6,052 | +$8,285 |
| za-prod | $246 | $75 | +$171 |
| shared-dev | $2,314 | $2,450 | −$136 |
| org / payer | $2,291 | $2,586 | −$295 |
| ug-prod | $1,190 | $1,439 | −$249 |
| za-stg | $203 | $267 | −$64 |
| global (ECR) | $402 | $953 | −$551 |
| others (combined) | $741 | $905 | −$164 |
gh-stg+prod is the only account that moved up — entirely the May 7-8 spike. All other accounts flat or down.
Service drift from Feb 2026 — golden baseline
Per-service drift · Apr 2026 vs Feb 2026 baseline (April = last clean full month, no spike noise)
| Service | Feb (base) | Mar | Apr | Δ vs Feb | Verdict |
| Amazon Neptune |
$3,067 |
$2,663 |
$7,995 |
+$4,929 · 2.6× |
In-flight: backfill |
| EC2 — Compute |
$1,317 |
$1,696 |
$4,082 |
+$2,766 · 3.1× |
Drift (alex c5.18xl + Karpenter) |
| Tax |
$5,000 |
$5,478 |
$6,608 |
+$1,608 · 1.3× |
Passive — grows with everything |
| SP for AWS Database Usage |
$0 |
$1,455 |
$1,475 |
+$1,475 · NEW |
Verify commitment was right-sized |
| Amazon SageMaker |
$93 |
$160 |
$412 |
+$319 · 4.4× |
Small base — watch |
| Amazon Rekognition |
$644 |
$83 |
$77 |
−$567 · 0.1× |
✓ face-dup scan moved off |
| Amazon ECR |
$891 |
$406 |
$359 |
−$532 · 0.4× |
✓ image cleanup paid off |
| Bedrock (Sonnet 4.5) |
$956 |
$863 |
$515 |
−$441 · 0.5× |
✓ usage trending down |
| Amazon RDS |
$4,176 |
$4,529 |
$3,890 |
−$286 · 0.9× |
✓ on baseline |
| SP for AWS Compute |
$4,025 |
$4,456 |
$4,266 |
+$241 · 1.1× |
✓ on baseline |
| AmazonCloudWatch |
$1,114 |
$1,342 |
$1,266 |
+$152 · 1.1× |
✓ on baseline (May spike unrelated) |
| EC2 — Other |
$3,246 |
$3,519 |
$3,156 |
−$90 · 1.0× |
✓ on baseline |
| Amazon S3 |
$1,103 |
$1,133 |
$1,020 |
−$84 · 0.9× |
✓ on baseline |
| SP for AWS ML |
$2,131 |
$2,359 |
$2,055 |
−$76 · 1.0× |
✓ on baseline |
| Total org (ex-Snowflake $40K one-off) |
$33,891 |
$37,110 |
$43,851 |
+$9,960 · +29% |
Net drift over 2 months |
The diagnosis: 14 of 18 top services are at or below Feb baseline. All the drift comes from 4 services: Neptune (+$4.9K — intentional backfill, reverses when complete), EC2 Compute (+$2.8K — alex c5.18xl + general Karpenter scale), Tax (+$1.6K — passive growth), and the new Database SP commitment (+$1.5K). Net +$10K/mo drift over 2 months. Strip Neptune-when-backfill-ends and the EC2 governance fix → drift collapses to ~$2K/mo, well within "minor growth" territory.
4-week rolling totals — raw vs normalized (normalized = excludes monthly settle + spike days)
| Week | Raw total | Normalized | Excludes | vs prior wk (norm.) |
| Apr 12 – Apr 18 |
$10,924 |
$10,924 |
— |
baseline |
| Apr 19 – Apr 25 |
$9,071 |
$9,071 |
— |
−$1,853 (−17%) |
| Apr 26 – May 2 |
$15,488 |
$7,515 |
May 1 settle ($7,973) |
−$1,556 (−17%) |
| May 3 – May 9 (last 7d) |
$22,273 |
$8,297 |
May 7+8 spike ($13,975) |
+$782 (+10%) |
Underlying business spend is healthy. Once you exclude the monthly settle and the spike events, weekly run-rate has been $7.5K–$10.9K and is roughly flat. The "+44% WoW" headline at the top is entirely the May 7-8 spike. If PR #567 closes today and no new spikes appear, next week's raw should land back at ~$8K.
In-flight — owners working it, track but don't re-investigate
-
LoginFeatureExtraction batch + log-level fix — data-platform PR #567 by Emile. Targets the asyncio.gather fan-out and the per-invocation INFO logs that drove $10.6K over 2 days. Lera tracking in
#7may-logs-indcident. Watch for: CW Logs back at baseline through the next backfill window.
-
AWS credit request for May 7-8 spike (~$11K of the $14K incurred) — submitted to AWS as human-error reimbursement (verbose logging in misfiring Lambda). Watch for: AWS Support response, typically 5-10 business days. If granted, brief auto-revises May projection downward (~$56K instead of ~$67K). If denied, the full $14K stays in the May number — June lever decision unaffected either way.
-
Backfill governance — decided May 10: action items already exist for triggering the next backfill, with cost monitoring built in. Pattern is being addressed at the trigger level rather than via a separate process review.
-
Neptune downsize — decided May 10: scale-down only when graph-feature backfill is complete. Re-evaluate trigger: backfill completion date — needed by mid-May to give the team runway to right-size before June 1 (see exec item above).
-
3× c5.18xlarge in shared-dev (alex) — feature-prefilter experiment. ~$5K/mo while running. Owner action: Platform to ping alex in
#7may-logs-indcident thread for status / TTL — needed before June 1 to keep June under cap.
-
CloudWatch Logs no-retention on data-platform Lambdas — 2.06 TB stored on
ghana-prod-LoginFeatureExtraction alone. Owner action: Platform to apply aws logs put-retention-policy --retention-in-days 14 to all /aws/lambda/* in gh-prod + ug-prod (one batch).
Tomorrow's automatic checks (re-run this brief Mon morning)
- CloudWatch + Lambda in gh-stg+prod — alert if >$200 in a single day after PR #567 merges
- Neptune (gh-prod) — alert if >$400/day (today $310) or if a 3rd instance appears
- shared-dev EC2 — alert if a c5.18xlarge / m5.24xlarge / r5.24xlarge launches
- Any account that wasn't in yesterday's spend — alert if a new linked-account ID shows non-zero spend
- SP for AWS Compute — still committed at $4.3K/mo; verify utilization stays >80%
- 1st-of-month settle (June 1) — RIs+SPs+Tax bundle, expected ~$8K, do not flag as anomaly
Daily total · last 14 days (Apr 26 – May 9)
Apr 26Apr 30May 4May 9 ✓
baseline ($1,200–1,700)
monthly settle (expected)
real spike (in-flight: PR #567)
yesterday — back to baseline
Standing cleanup backlog — doesn't change daily, do when you have a window
5+ idle SageMaker endpoints (za-dev / zm-dev / zm-stg document-labeling, 2 dev RiskModel)
stale 30+ days
~$300/mo
CW Logs retentionInDays: None on data-platform Lambdas
2.06 TB stored
~$60/mo + caps blowup
2 unattached EIPs in ug-prod
unknown
~$7/mo
EC2 owner-tag enforcement SCP (only ~30% of shared-dev tagged)
no policy yet
prevents next surprise
Snowflake — month trajectory
⚠
May 7 spike to $1,169 (6× normal) — eden.b loan-account-history backfill.
May trajectory: MTD $2,673 across 11 days. If May 7 spike doesn't repeat, May normalizes to ~$4,500 — under the $5,245 Feb baseline ✓.
The spike driver: a NEW warehouse BACKFILL_LOAN_ACCOUNT appeared 7d ago at 261 credits ($990) vs $0 prior week. All 3,396 queries by eden.b (ACCOUNTADMIN). 3,356 of those are MERGE statements into GHANA_PROD.SEMANTIC_LAYER.LOAN_ACCOUNT_HISTORY_BACKFILL, scanning 38.9 TB.
This is a SECOND backfill — separate from the fido-score backfill (PR #567) that drove the AWS spike. Two concurrent data-team backfills.
May MTD: $2,673
Excl. May 7: $1,504 / 10d
Projection: ~$4,500
vs Feb base: −$745 ✓
Yesterday (Snowflake)
$198
vs $155 trailing baseline · normal
May 7 spike
$1,169
309 credits · 6× normal day
May projected (Snowflake)
~$4,500
Under Feb baseline $5,245 ✓ (excluding May 7)
BACKFILL_LOAN_ACCOUNT WH
$990
7d · new warehouse · eden.b · 38.9 TB scanned
Top warehouses · last 7d
| Warehouse | 7d $ | vs prior wk |
| BACKFILL_LOAN_ACCOUNT | $990 | NEW · $0 prior |
| COMPUTE_WH | $755 | +42% |
| ANALYTICS_WH | $445 | −3% |
| DEV_STG_WH | $128 | +75% (small base) |
| CLOUD_SERVICES_ONLY | $2 | trivial |
Strip BACKFILL_LOAN_ACCOUNT and 7d total drops from ~$2.3K to ~$1.3K (back to baseline).
Monthly trend · Snowflake
| Month | Credits | USD | vs Feb base |
| Nov 2025 | 1,379 | $5,211 | −1% |
| Dec 2025 | 1,799 | $6,802 | +30% |
| Jan 2026 | 1,450 | $5,479 | +4% |
| Feb 2026 (baseline) | 1,388 | $5,245 | — |
| Mar 2026 | 1,043 | $3,942 | −25% |
| Apr 2026 | 1,169 | $4,421 | −16% |
| May MTD (11d) | 707 | $2,673 | trending ~$7,290 raw / ~$4,500 ex-spike |
$/credit = $3.78 at Fido (NOT $2 docs default).
Combined BVA context
Two concurrent data-team backfills
Both intentional, both currently inflating cost. Worth tracking when each ends so projections normalize.
- Fido-score backfill (Emile, data-platform) — drove AWS May 7-8 spike ($14K incurred, $11K credit pending) and ongoing data-fido-score-models errors. Fix in PR #567.
- Loan account history backfill (eden.b) — drove Snowflake May 7 spike ($1,169 one day). 3,356 MERGEs into
SEMANTIC_LAYER.LOAN_ACCOUNT_HISTORY_BACKFILL, 38.9 TB scanned. No public PR tracker yet — worth asking Eden when complete + whether the new BACKFILL_LOAN_ACCOUNT warehouse stays or gets dropped.