Epistemology and A/B Testing

A/B testing is all the rage in certain web development circles. Naturally, when something becomes popular the criticism starts. I’ve read some unconvincing attacks on A/B testing recently, as well as some good ones, so I want to lay down my thoughts on what A/B testing is and what it isn’t.

The general method of A/B testing on the web is as follows:

  • Decide on a change to make to the site. This could be as small as the wording of a title or as large as the entire navigational structure of the site.
  • Decide what outcome you want to measure. Typical examples are purchases, time spent on the site, and number of repeat visits.
  • Randomly assign each visitor one of the two (or more) versions of the site.
  • Measure how the different versions stack up against the outcome of interest.

This is a fairly simple thing. Critics of A/B testing usually claim that it is only good for small changes. It cannot, they claim, be used for business-changing disruptive innovation. The critics are wrong. They are confusing the principles underlying A/B testing with the commons implementations of the idea.

How We Acquire Knowledge

There are basically three means by which we come to acquire knowledge:

  1. By appealing to authority.
  2. By constructing statements consistent with assumed first principles.
  3. By making observations on the effects of actions.

The third method has proven to be vastly superior when studying the natural world, and is the basis of the method known as science. If you are reading this then you are validating the efficacy of this method, as the computer you are using is the result of a few hundred years of scientific developments.

The primary mechanism of science is the experiment. An experiment involves performing some action in the world and measuring it’s effect. If different actions leads to different outcomes one typically does some statistical analysis on the result, to determine if one is justified in believing the differences represent a true difference or are just the result of chance.

A/B Is Science

A/B testing is science. A/B testing is about taking an action and measuring its effects. That is, doing an experiment. One can experiment with small things, like the colour of a button on a web site. One can also experiment with large things, like business models, new technology, and other disruptive changes.

The critics see the small experiments used to market A/B testing to internet businesses and think it is the totality of the method. They are right that companies usually don’t A/B test large changes. It is unusual to run two or more different business models, for example. That doesn’t mean these experiments aren’t done, but they are typically done at the level of the market rather than the individual company. Different companies, called competitors, experiment with a particular combination of strategy, model, and implementation, and the market measures their effect. Sometimes big companies will run these experiments internally. Google, for example, is currently experimenting with both Android and Chrome OS in more or less the same space. Complex experiments like this aren’t controllable nor are they repeatable, so the methods of social science are preferred over those of the hard sciences, but they still fall within the scientific paradigm.

A/B Testing Isn’t All That

I’ve said A/B testing is science, and science is great. However I do think the current implementation of A/B testing, as used by web companies, is flawed. The reason is we’re usually interested in decision making not hypothesis testing, and with decision making we want a different setup than is currently used. Exploring this is for another post.

Tags:

6 Responses to “Epistemology and A/B Testing”

  1. Paras Chopra says:

    Wow, great post! Wonder why it didn’t get any attention on the web. I am totally sick of arguments against doing red v/s blue changes actually attributed to A/B testing.

    Let’s face it: A/B testing is a methodology. It doesn’t dictate what you actually test. Looking forward to your detailed arguments in the next post. I also have been planning to write something on this for long — I think it is right time now to do so.

  2. Oli Gardner says:

    Love your style of writing. Will be sharing this big time.
    Your writing style and sensible thinking kept me reading, but the phrase that struck me most was:

    “we’re usually interested in decision making not hypothesis testing”

    Would love to see a post from you on the concept of hypothesis testing. I think you are really getting to the crux of it with that.

    I think you can definitely achieve business changing disruptive innovation (to quote another of your excellent lines) using A/B – it comes down to the thought process behind the tests and how willing you are to stretch the boundaries of that test. Yes, ou can change a small element of the page, but why not do a new explorative study of your target market and pull off a new page with radically different messaging?

    You might not know which individual piece of the puzzle gave you the conversion lift, but with MVT, you are throwing so many variations out there that not only will some of them be incongruent with the intended page message, but you are also left with a sense of “this page rocked” but I still don’t really know why (except I was told it worked better). But do you really understand why? No more so than A/B in my opinion.

    On another note (and drawing on Paras’ point), it is just a methodology. You can never predict the ways in which and by whom it will be applied – which comes back to natural selection – not only will some pages perform better than others, but certain people are in a better position to formulate and ideate their way to more meaningful hypothesis than others.

  3. I cannot agree more with the statement that, “we’re usually interested in decision making not hypothesis testing, and with decision making we want a different setup than is currently used”

    We make decision under uncertainties, what we need is information (be it from experiment, marketing research, A/B testing etc) that will help reduce the uncertainty is worth our while.

    For web testing, there are many methodological flaws with A/B testing, including treating statistical significance as economic significance, wrong statistical procedure and using whale of sample sizes.
    In terms of overall usefulness of A/B testing in decision problems I explored this using Bayesian Statistics.

    I will be happy to point to some of my articles and/or a conversation if you are exploring, as you say, about hypothesis testing and decision making under uncertainty.

    -rags
    http://twitter.com/pricingright

  4. Excellent article, Noel. Perfect epistemology of A/B testing, which truly is a form of an experimentally derived experiential knowledge.

    Your last paragraph is great too. Very much looking forward to reading your reasoning (and thoughts) behind the statement that “the current implementation of A/B testing, as used by web companies, is flawed”. The way it’s phrased now allows a lot of room for discussion without actually hearing out what you have to say in the first place.

  5. Noel says:

    Thanks all for the kind words. I’m definitely guilty of trying to end on a cliffhanger with that last paragraph; I’ll be posting more on this issue as I get time. Rags, I’m interested to see what your take on this issue. I’m working on a small bibliography of relevant literature and I’ll let you know when it’s done.

  6. [...] one isolated variable you can be more confident in the results. But lately there has been lots of discussion about the downsides of this incremental approach. The argument is that if you continually test [...]

Leave a Reply