 By Dan Williams

As a proud data driven agency we need to be confident that we are not mislead by data. Statistical rigour ensures we are confident in our results and are making the right decisions for our accounts.

Why Bother?

Clients care about statistical significance, as should a good account manager. You might have set up the most groundbreaking PPC experiment ever, but without statistically rigorous analysis you don’t have a result, just a mess of data. Did your result come about purely due to random chance? If you repeated your experiment 10 times, would you get the same result every time? Maybe, maybe not. Statistical tests are the only way to get to the bottom of these questions.

To prove a point, I set up the most useless A/B experiment ever. For a single ad group, I split traffic between two sets of ad copy, making no changes to either set. Users should react to both sets of ads in exactly the same way. So why after several days of running this test were the click through rates (CTRs) completely different? Random variation is responsible for the difference. The CTRs differed for no other reason than irregularity and chance. This is something I would have known had I carried out the correct statistical test (a chi-squared test)! If the two sets of ad copy had been different, I might have taken the data at face value and marched ahead with the result, potentially changing my entire strategy based on nothing more than blind luck. Incidentally, the p-value from the chi-squared test for this experiment was 0.076.

What is a p-value anyway?

Many statistical tests will eventually spit out a single number: the p-value. This value represents the probability that the results from your experiment came about due to chance. A lower p-value indicates that a result is significant. But how low must a p-value be before we can call a result statistically significant?

This is what significance levels are for. In 1925 a Mr. Ronald Fischer arbitrarily decided that 0.05 was a good significance level, and this is generally what we use today. Any p-value below 0.05 is significant, anything above is not. Significance levels can be slightly flexible depending on how confident you want to be in your results, but generally 0.05 is standard.

Beware of setting your confidence limits too high! The higher it is, the more likely you are to accept a false positive result as significant!

What tests can I use?

I won’t go into much detail on the statistical theory and mathematics behind these tests. If you understand what tests to use, when to use them, and how to interpret the results, you’re good to go.

Generally, account managers need to ask 3 types of question:

1) Differences in mean – e.g. Do we get fewer clicks on the weekend?

Here we are looking for a difference between two mean averages: mean clicks on the weekend vs mean clicks on weekdays. The test we want to use here is a ‘t-test’ (or more specifically, an unpaired two tailed t-test). To carry out this test in Excel, you need to use the following formula:

=T.TEST(GROUP 1, GROUP 2,2,2).

GROUP 1 and GROUP 2 in this example would be a list of clicks you have received on weekends and weekdays over a suitable timeframe, but can be changed to whatever it is you’re testing. This test will return a p-value which you can then compare to your significance level to determine significance.  Good online calculators are also available.

2) Association - e.g. Does our conversion rate change with the temperature?

Here we want to test for a correlation between two variables, which in this example are: temperature and conversion rate. Typically, we would want to use a Pearson correlation test to answer this. In Excel you can use the following formula:

=PEARSON(GROUP 1, GROUP 2).

GROUP 1 here might be the list of conversion rates which would be paired with their corresponding temperatures (GROUP 2). This test returns a “correlation coefficient” which on a scale of -1 to 1 tells you if your two variables are positively correlated (numbers close to 1), negatively correlated (numbers close to -1), or not correlated at all (numbers close to 0).

To get a p-value for this, you need to do an extra step which unfortunately isn’t easy to do in Excel. Unless you have statistics software or in-house tools, you’re going to have to use one of many online tools.

3) Difference in proportions – e.g. Which ad copy has the higher CTR?

This is something you could answer via an A/B test. In PPC, A/B tests are regularly used to incrementally improve KPIs such as click through rate and conversion rate via ad copy tests or landing page tests. In the case of an ad copy test, we look to see if the proportion of clicks to impressions (CTR) for one ad copy is higher than another. For this we use a chi-square test. Unfortunately, there isn’t a super quick way to do this in Excel, so unless you have an in-house tool (like we do!), or are handy with statistics software such as R or Minitab, you will want to use an online tool.

Fortunately, a lot of the time AdWords and DoubleClick Search will do the work for you, letting you know if an experiment is significant or not. The symbols next to the metrics in the above image indicate if there are any statistically significant differences (you can hover over them in AdWords for more info). 