What is data normalization?
Normalization is typically a process applied in relational database design to organise data, aiming to reduce redundancy and improve consistency and accuracy. Generally, the process takes place after the data has been collected. The main goal is to associate similar forms of the same data items into a single data form and provide a clean data-set which you can query and perform analysis on.
However, normalization does not always have to be performed on data that has already been collected – you can apply it during the collection stage.
Data normalization in Analytics tools
Normalization is equally important in tools such as Google Analytics (which essentially relies on data stored in a database). One example of when normalization is required is for case formatting. In Google Analytics, strings are treated as unique and distinct if they are in lower or upper case. (i.e. "String", "string" and "STRING" are three different unique entries). This is why best practice is to always account for casing and have a tracking setup that does the normalization during data collection.
Think about capturing a custom dimension that takes the value of a user entry. It can be added by the user as UPPERCASE, lowercase or left empty. In that example, you end up having multiple variations of potentially the same entry, which can make analysis harder and data will have to be cleaned each time it is used.
Additionally, in the example above, if the returned value from the custom dimension is null (i.e. it is empty), in order to make sure it's ignored by GA and not set as "null string”, we have to convert it to "undefined".
Recent GTM update that made the process easy
A recent update of Google Tag Manager added the Format Value option in all of its variables. Format Value, allows you to modify the output of the variable with a number of predefined transformations which are executed by the gtm.js library itself, instead of additional user added code (which can be prone to errors and in some cases not even an option).
The options that are currently available are:
With these new options available, tackling problems such as those mentioned above a lot more streamlined. Hopefully the list of formatting options and flexibility of use will only grow moving forward.
Another example use of Format Value is when it’s applied to URLs:
Being able to format the Full URL like this means that less technical users no longer need to use RegEx to account for case differences when using the Full URL variable as a trigger condition, for example.
Using contains/equals/starts with has always been a prefered option for non-devs that do not want to mess with RegEx.
By making sure you are using best practices and advanced tools when setting up tracking, you are producing cleaner data which leads to better analysis and fewer nuances. Additionally, taking advantage of feature-rich platforms such as Google Tag Manager removes many complexities and makes configurations easier to understand for less technical users.
If you want to step up your data collection and increase data accuracy, reach out to the Analytics Team at Merkle|Periscopix!