What you're looking at
Each Sankey traces audience refresh and create requests through every layer of the pipeline: RBA (rule-based audience engine) → AMC workflow (query execution) → AMS (orchestrator + state-machine routing) → Igno (staging file write) → DSP / SA activation. Every node shows count + percentage of its parent. Hover any node or band for the absolute count and percentage of the overall pipeline.
Open a diagram
90-Day Pipeline Sankey
Aggregate flow over the full 90-day analysis window. Best for understanding overall pipeline shape: ~80 % succeed end-to-end; SA refreshes dominate volume; STARK lookalikes underperform with a 31 pp lower refresh-success rate than RULE_BASED.
1-Day Pipeline Sankey
Snapshot of a typical day. Use this to see what daily volumes look like in practice — same shape as the 90-day chart, just scaled to one day. Useful for sanity-checking SLO/throughput discussions.
How the data was extracted
-
RBA submissions / successes / failures from
andes.amcactionsanalyticsprovider.amc_rule_based_audiences(90 days, deduped per(simpleaudienceid, refreshattempts)). -
AMS volume + dual-write artifact correction from
andes.amctainterfaces.amc_audience_management_service(filtered tosortkey LIKE 'LATEST_VERSION#%', deduped per(audiencemetadataid, currentversionid)preferring the row with non-emptystepfunctionexecutionarn; raw rows are 2× inflated for refresh-success due to a pre-SF + post-SF dual write inAudienceMetadataUpdater.kt). -
Routing share (Igno vs Legacy) from AWS Step Functions
CloudWatch metrics (
AWS/States, last-24 h ground truth): 96,048IgnoAudienceStateMachineexecutions vs 28,567UpdateAudience-prod-us-east-1= 77 % Igno / 23 % legacy. -
No-op refresh share (0.19 %) from CloudWatch
EmptyDataFrameon the Igno staging Glue job. (Earlier estimates of 12-26 % were Iceberg-snapshot-timing artifacts, refuted by SF execution history, direct S3 object-version enumeration, and the CloudWatch metric.) - Audience-type breakdown (RULE_BASED vs STARK / DISPLAY vs SPONSORED_ADS) from the same RBA + AMS Andes tables, computed per-bucket and applied to the deduped daily series.
All numbers in the Sankeys reconcile arithmetically with delta = 0 at every
stage. The build script (build-sankey.py) prints a sanity check of every
subset → parent sum at render time.
Key takeaways
- ~80 % of submissions succeed end-to-end (RBA → AMS → Igno → DSP/SA). Most of the 20 % failure share is "audience too small / empty" — >93 % of failures at every layer trace to that single root cause.
- STARK lookalikes underperform. 10 % of refresh volume but 27 % of refresh failures. Refresh-success rate 55 % vs 86 % for RULE_BASED. The 32 % of STARK failures driven by Insufficient-Seed-Match + Max-Size-Exceeded are lookalike-specific and addressable with pre-validation at RBA-submit time.
- 72 K SA audiences refresh forever. The DSP-segment-usage termination gate (priority 999 after 100 days unused) is DISPLAY-only. Sponsored-Ads RBA audiences have no usage-based gate — they keep refreshing daily until the customer deletes them. Of the active universe, 72,423 SA audiences are 180+ days old and still actively refreshing.
-
80 % of prioritization-Lambda fires are no-ops by design. EventBridge
fires daily for every audience, but
PrioritizationCalculatorgates by per-audience cadence (RULE_BASED default 1 day, STARK default 7 days). 1.04 M Lambda invocations/day are pure no-op wakeups — opportunity to delete the schedule on the 100-day-unused path instead of letting it fire forever.