ecommerce

Minimum Detectable Effect

The smallest true lift an A/B test could reliably call statistically significant given its sample size, baseline conversion rate, statistical power, and significance level — the floor of what the experiment can see, not a prediction of what it will produce.

Also known as: MDE, Minimum Detectable Lift, MDL, Detectable Effect Size

Minimum detectable effect (MDE) is the smallest true lift an A/B test could reliably call statistically significant given its sample size, baseline conversion rate, statistical power, and significance level. It is set before the test by those parameters, not a prediction of what the test will produce. Most “no effect” reads on small DTC traffic are not flat experiments; they are tests that were under-powered from the start, where the lift the operator hoped for was below the floor the test could resolve.

What sets the floor

Four levers move MDE.

Baseline rate comes first. A lower baseline conversion rate widens MDE in relative terms, because the smallest absolute lift a test can resolve stays roughly stable across small baselines, and dividing a fixed floor by a smaller baseline produces a larger percent.

Sample size per variant comes second. MDE shrinks with the square root of sessions per variant, so doubling traffic narrows MDE by about 30%, not 50%.

Statistical power comes third. The convention is 80% — if the true effect equals MDE, the test has an 80% chance of detecting it. Raising power widens MDE; some teams accept 70% on small traffic to narrow it at the cost of more false negatives.

Significance level comes fourth. The convention is 5% two-sided — a 5% chance of calling a flat experiment a winner in either direction. Tightening it widens MDE.

A simple storefront example

Imagine a Shopify-shaped store with roughly 50,000 monthly sessions and a 2% baseline conversion rate, running a two-week A/B test at 80% power and 95% confidence. An online sample-size calculator will reproduce an MDE somewhere in the 15–20% relative range for inputs in that rough shape; the exact figure depends on the per-variant traffic split and the variance of the primary metric.

Translated: this storefront cannot reliably detect a 5% relative lift in two weeks, even if one exists. The number is illustrative; the shape is the failure mode behind most CRO disappointments at small scale.

When not to run the test

The honest way to use a sample-size calculator on DTC traffic is to run it in reverse. Forward asks: given a desired lift, how much sample do I need? Reverse asks: given the sample I will get, what is the smallest lift I could see? The storefront fixes the sample, so reverse matches reality.

Compare that MDE to the lift the change is plausibly worth. A checkout-friction fix might justify a 3–5% relative-lift expectation; a hero-image swap usually does not. If MDE exceeds the plausible lift, the test cannot answer the question; the real options are extending runtime, broadening the audience, shipping a bigger change, or skipping the test. This is the diagnostic mature CRO programs run before committing weeks of traffic.

The same logic applies off the storefront: a holdout or incrementality study has its own MDE driven by holdout size and baseline behavior. One last guardrail: MDE is a pre-test parameter, not a result. “The test detected a 12% MDE” is a category error.

Related terms

Referenced by