Periscopix

“You can do anything, but not everything”

David Allen

Rebecca de Yong

Read bio

So what’s the problem?

Unfortunately for every one of us the human brain is both very clever and very stupid at the same time. In the context we’re talking about here, you can imagine a person’s brain as a massively sophisticated pattern recognition device. Every day you see dozens or hundreds of people, and you can often recognise their face immediately. You can differentiate between an office chair and a dining chair, between a credit card and a business card, or between a Villa shirt and a West Ham shirt.

These are all pairs that have essentially the same ingredients: both chairs have four legs, a seat, a back, arms and padding. They are both roughly the same size and have the same basic geometry. But when you look at them, you can see immediately which is which because your brain picks out the patterns that correspond to previous examples you’ve seen of each. It doesn’t tell you it’s doing this of course, it just reports back: “yep, office chair.”

This kind of problem is incredibly difficult for computers. You can provide the most clever pattern recognition software ever made with hundreds or thousands of images, but they find it very difficult to get close to a human’s abilities to do this kind of thing. With a few exceptions.

Those exceptions are the major problem here. Your brain is great at quickly identifying something and comparing it against the features it knows, but it’s terrible at details. Do you think you could compare fingerprints as well as a machine?

Because your brain analyses patterns by this “leaping to conclusions without checking the details” method, it leaves people open to gross error, in all kinds of pattern recognition. Your brain isn’t as smart as it tells you it is. It’s constantly looking for patterns, for coincidences. As soon as it thinks it finds them, it won’t tell you “Dude, that thing has four legs, a seat, a back, and black padding like an office chair I saw two months ago.” It’ll just tell you “yep, office chair.” If it’s wrong, you haven’t really understood that it’s only similar in a few ways, your brain has told you it’s the same.

The same problem occurs when you’re looking at data. Your brain is desperate to find matches and coincidences. It’s looking for things that mean dataset A is the same as dataset B, or that the before data is different from the after data. Your brain will see this, whether or not it’s there.

Coincidences confuse your brain. You assume that a coincidence is an unlikely event. Like seeing a school friend at the airport, both going on holiday to the same place at the same time. It’s so unlikely it becomes a noteworthy event!

Or does it? Think for a minute about how many other coincidences could have happened. Think of all the people you used to go to school with, or college, or used to work with, or friends of friends, distant relations, that person you see every week at the gym… Wouldn’t it be equally unlikely if you’d seen one of them instead? Now what if you could have seen them in one of the many other places you might go? A restaurant, the supermarket, a meeting at work, a pilates class, a football match. Any one combination of these might seem unlikely, but there’s so many possible combinations that actually the probability of one of them happening is surprisingly high. Chances are that this sort of thing has already happened to you more than once. For a fuller explanation of this effect, see this You Are Not So Smart article about what is known as “The Texas Sharpshooter Fallacy”, where your brain will pick out related items between two events, and ignore the overwhelming unrelated ones.

What this all means is that when you are trying to analyse data (in Adwords or elsewhere) you can’t just look at the results of a test and say “this worked” or “this failed”. Quite simply, your brain is probably tricking you.

How this fixes it

The launch of Adwords Campaign Experiments lets you rely on statistics to do the number crunching for you.

You may know about the “Normal” distribution, also known as the Gaussian distribution. This is a shape of curve that looks like a bell (below). This is important, because if you do something (like throwing a die) that has an expected average (mean) result but an equal probability of every result, then the mean of a few throws won’t be the same every time. If you did it a bazillion times, it would tend towards the true mean (3.5 in the case of a die), but each individual set of 10, 100, or 1000 throws won’t have a mean of exactly 3.5. Some will be higher, some will be lower. Most will be close, but some will be outliers. The result (the distribution of means) is the probability distribution function (pdf) below.

Now that we know that the mean of means always take this form, we can extrapolate from this. For instant if we compare the results of an experiment against this graph assuming no experiment had been performed, we can say for certainty if the experiment had an effect. Let’s use an example from Adwords.

You have an ad with an average click-through rate of 3% in position 3. You believe that will decrease if you run the ad in position 4. But on any one day your click-through rate could be 2%, 3%, 4%, sometimes even further off the average.

If you run in position four and on the first day you have a click-through rate of 2%, that could be because you lowered the position. It could also be just because some days you have lower click-through rates. The average in your new position could be the same as in your old position. How many days would you have to run in the lower position to be able to know for sure?

If your click-through rate data is a collection of daily averages, then your data from position 3 will likely look much like the normal distribution above. If your click-through rate data in position 4 falls far enough to one side of this bell, you can deduce that it is quite unlikely to just be an outlier from the same prior click-through rate, and that the true click-through rate has now changed.

The best test for this is called a t-test. A t-test relies on the data being a normal distribution, and allows you to compare two averages to see the likelihood that they came from the same actual dataset. If the result is unlikely, then you can conclude that your experiment caused a change, and the change wasn’t just chance.

So now we know that our brains don’t understand patterns and what statistics we need to analyse to determine if our experiments have worked, we need to know other reasons for biases to this data.

Most importantly, we need to control our other changes. If we ran in position 3 in March and position 4 in April and our t-test determined that our click-through rate did in fact drop, can we definitely say that the change in position caused it? Maybe there was more advertiser competition. Maybe our ad text changed. To know for sure that the position was responsible, we’d need to find a way to make sure that only the position changed. This is usually impossible in Adwords, and we’re compelled to do these flawed before/after comparisons.

ACE let us set up these tests without having to do before/after comparisons. We can test a different position on the same keyword at the same time. Adwords will give 50% of traffic to the “control” group (where nothing changed) and 50% to the “test” group, with the new bid. So both are seeing the same ad, both are running against the same advertiser competition, etc. All the other potential changes apply equally to both groups, so a difference between the groups must be due to the test.

When to use it

There are a whole host of things you might want to test using ACE. Bids, ad texts, match types. Each of these can benefit from this kind of analysis.

If you have an ad that you think works well for certain keywords, but seems to be performing poorly for the ad group overall, move this ad and those keywords into a new ad group as your experiment. You can compare the control (all together) versus the experiment (split apart) to see if this ad works better with just these keywords.

If your highest traffic keyword has a great conversion rate but not enough traffic, and looks like it might be too dangerous to run in broad match, test it. In the control this keyword can stay in exact or phrase, in the experiment you can try it in broad match to see the effect on click-through rate and conversion rate.

Just like in the earlier example, trialling a keyword in a new position is likely to be the best use for ACE. An individual keyword may or may not change click-through rate or conversion rate based on its position. By using ACE you will know for sure if that improvement in click-through rate was definitely due to your higher ad position and not a result of increased brand recognition.

How to use it

When looking at your AdWords campaign, in the Settings tab you should see a new option in the advanced settings called “Experiment”. This is where you go to start setting this up.

Click “Specify experiment settings” to expand the box and get access to the following settings:

  1. Experiment name - for identifying this experiment in the results
  2. Control/experiment split - for determining how many people should fall into each group
  3. Start Date
  4. End Date - the longer you run an experiment the more solid your data will be, but you may want to use a shorter time period to be able to implement positive changes sooner

You can then click on “Change keywords, ad groups” to go into a normal campaign window for making changes. Make the changes you want to in this window, then when you start running the experiment these will begin running for the experiment group.

Once your results have built up, click “Apply” to ditch the control group and run only with the changes you made. If it turns out the changes weren’t positive, then instead use “Delete” to switch back to the originals.

How to interpret results

This is the easy bit. Google will start showing you results based on “statistically significant differences”. This means that the experiment version is different enough from the control version that it looks like it definitely came from a different dataset, so the results aren’t just coincidence. Google will let you know what has happened, and what are the chances that this has made a real difference.

That’s it. Google does the rest for you. No statistical models needed. No t-tables, no confidence intervals, and no linear regression. Google will give you a little picture and you’ll be able to see the result.

Voila!

The next bit is up to you

How are you going to use this? How are you going to improve your campaign with ACE?

None of what we’ve talked about here will be new to experienced Adwords managers, but it’s always been difficult and time consuming to achieve, and impossible in some ways to truly split visitors into control and experiment groups. So this will save everybody time and effort in finding out how to make real, significant differences to campaigns.

Good luck!

blog comments powered by Disqus

Pulley!