Periscopix

“The artist is nothing without the gift, but the gift is nothing without work.”

Emile Zola

Mylene Curie

Read bio

One of the best features about Google Analytics is the speed with which it is possible to put together reports of almost any nature you need. You can select the metrics you want to see, drag and drop segments you’re interested in, and you’ll get all kinds of data presented to you in seconds. But to keep this interface speed up, Google Analytics puts a hard limit of how much calculating it will do.

If total results are above a certain amount (especially true for very high traffic sites) then you may notice that you see a “This report is based on sampled data” notice on your report. Google can’t commit to crunching all of those numbers every time anybody asks for a report. It’s not feasible. But, the problem then is how they choose to sample your data. They are suddenly in control of it, not you. What if they used the morning’s data and extrapolated to the afternoon? What if they used the first X visitors to your site, but not the ones after you made a crucial change? So they give you the ability to sample all your data before it reaches your site.

This has positive and negative consequences.

Positive

  • You can control your sampling rate. 1 in 4? 1 in 5? Your choice.
  • Sampling will be even across your visitors. Data will be sent once every fourth visitor (or whatever you choose). Not just when Google choose to include it in a calculation.
  • You will have a much smaller total dataset stored online in your GA account. Reports will load faster, no sampling will be carried out on reports, and long tail pages will be more readable.

Negative

  • This is destructive. This data is sampled before it ever reaches GA, so it’s not stored anywhere. No way to get it back. If a statistical oddity occurred and your sampling accidentally only caught visitors who bounced, that’s tough.

In my opinion, this is a biggie. You need to be sure that your site is large enough to iron out these statistical oddities.

When would you want to use this?

If you have a very large site, with a paid-for web analytics solution and a dedicated team of web analysts, you may want to use Google Analytics as well. Your marketing director might want quick and dirty access, your website team might want to be able to do quick tests. Google Analytics will allow you to set this up easily and do these quick analyses in a user-friendly way. Sampling will allow you to run these on a light data set and get simple information easily. But you’re not losing any data because your main solution is keeping it for you.

How would you implement it?

A simple change to the GA code will allow you to do this. Before you call the _trackPageview() function, insert the following line of javascript:

pageTracker._setSampleRate(”80″);

where 80 is replaced by whatever sample rate you want to use, expressed as a percentage. If you think this is suitable for your site, go ahead and try it. But be warned, you can never recover the data you’re losing. Just give it a go if your site is large, a lot of your reports tell you they’re based on sample data, and long-tail reports are large and cumbersome to understand.

  • del.icio.us
  • Digg
  • Facebook
  • Reddit
  • StumbleUpon
  • Twitter

Related articles & content

Related articles & content

Related articles & content

Related articles & content

Related articles & content

Related articles & content

Related articles & content

Back to blog articles

Pulley!