ecommerce

Holdout Test

A randomized experiment that withholds a marketing treatment from a share of the audience to measure its incremental lift.

Also known as: Holdout Group, Holdout Experiment, Control Group Test, Suppression Test, Audience Holdout

A holdout test is a randomized experiment that withholds a marketing treatment (an ad, email, SMS, push) from a defined share of the audience — the holdout group — while a comparable test group receives it. Assignment must be randomized at the user level before the send; post-hoc “lookalike” holdouts reintroduce the selection bias randomization is meant to eliminate. Lift is the effect relative to the untreated baseline:

Lift
Treated − Holdout Holdout

Some decks divide by treated; state which denominator the number sits on.

Holdouts are the credible read on channels where last-click attribution overstates: branded search, retargeting, email, SMS, push. These touchpoints disproportionately reach users already converting, so platform-reported revenue measures proximity, not cause. The same design inside an ad platform is conversion lift.

The honest read is often uncomfortable. A high-frequency email flow reporting strong platform-attributed revenue-per-recipient may show meaningfully lower incremental revenue on a holdout — the gap depends on the flow, the audience, and existing organic demand. The only honest number is the one a brand’s own holdout produces.

The cost has two meanings. Revenue held out from the sample is what the brand chose not to send. Revenue actually lost is only the incremental portion of that — on low-incrementality channels, a small fraction of the held-out figure.

Holdout size should be chosen from the expected effect size and the sample needed to detect it with adequate power, not a universal percentage. Owned-channel programs commonly run small recurring holdouts in the single-digit-to-low-double-digit range, but that’s practice, not a target. Brands pair them with geo-lift or marketing-mix modeling against MER for paid media, where user-level randomization isn’t practical.

Related terms

Referenced by