A/B Tests, 2-tail vs 1-tail tests & reporting the variance.

An article form June 2014 on how Optimizely and others only do one-tail tests, which simplifies the conclusion.

Entertaining article, here are some take-aways:

  • Do a test that is two-tailed (to see if there is significance for the opposite as well)
    • “The short answer is that with a two-tailed test, you are testing for the possibility of an effect in two directions, both the positive and the negative. One-tailed tests, meanwhile, allow for the possibility of an effect in only one direction, while not accounting for an impact in the opposite direction.”
  • Do several tests (to test if same hypothesis arises)
  • Run Tests Longer (to get more variety in the users)

Article: http://blog.sumall.com/journal/optimizely-got-me-fired.html

Some of the the hacker news comments (from January 2016) are interesting too, one mentioning that showing the variation is essential:

“In my view, the issue is not one-tail vs two-tail tests, or sequential vs one-look tests at all. The issue is a failure to quantify uncertainty.

Optimizely (last time I looked), our old reports, and most other tools, all give you improvement as a single number. Unfortunately that’s BS. It’s simply a lie to say “Variation is 18% better than Control” unless you had facebook levels oftraffic. An honest statement will quantify the uncertainty: “Variation is between -4.5% and +36.4% better than Control”.

“We just report credible intervals. We find that to be the only honest choice.


– yummyfajitas


Another point to raise is the issue on the impact on the Long Time Value of the user too. So, yes we got a higher CTR in the test, but how is the impact on LTV, is it the right kind of users who are clicking on that button?

Leave a Reply

Your email address will not be published. Required fields are marked *