Google Analytics (GA) is the most popular digital measurement and reporting tool in the world. The data collected and measured in GA is used for standard activities such as reporting on website performance, but also more advanced activities such as UX and web application design, informing Media strategy, and other forward-looking activities.
Regardless of how GA data is being leveraged, the foundation of sound analytics is established from data integrity. Throughout our engagements with clients using Google Analytics we’ve recognized a pattern of critical data integrity issues that often go unnoticed. As we lay out five common mistakes in Google Analytics implementations, we encourage you to check your own GA configuration to confirm the integrity of your own data.
Incorrect Google Analytics Implementation
Historically, Google has offered multiple ways to implement Google Analytics on a web application. One unfortunate consequence of this flexibility is instances of multiple GA implementations on the same web application. How does this problem occur? Google doesn’t do us any favors right out the gate by maintaining two analytics libraries, gtag.js and analytics.js, which leads to some confusion. Including Google Tag Manager (GTM), Google Analytics can be installed on a website three different (correct) ways, and unfortunately countless incorrect ways. One common mistake is for the GA code to fire twice on the same page, often because the older analytics.js code is present and someone (usually at a later date) implements the same GA property code via the GTM container. This can cause serious data integrity issues such as double-counting. An example of what the Google Tag Assistant chrome browser extension looks like when the GA code is loaded twice on the same page is shown in Figure 1 below.
Figure 1: Notice that the Google Analytics tag on the left is yellow. After clicking on the yellow GA tag, the pane on the right displays the specific warning that “The same web property ID is tracked twice”.
In the above example the fix was simply to remove the superfluous GA analytics.js code from the site’s html and ensure that any GA settings (e.g. cross-domain linker, anonymize IP etc.) were migrated to the GTM analytics settings variable. Care must also be taken to scour the site’s html for any other GA tags so that they can be similarly replicated in the GTM. These include but are not limited to event, remarketing, and ecommerce tags that might have been placed manually in the site’s html.
Migrating the GA code to the tag manger is recommended because this adds functionality and streamlines data collection, while decreasing the need to involve website development resources when making updates to your website’s data collection. For this and other reasons, we prefer to install Google Analytics using Google Tag Manager. An additional reason for installing GA this way is that any future changes Google makes to how the analytics code works should be updated automatically. A similar situation recently occurred when Google Ads (formerly Adwords) and Google Campaign Manager (formerly Doubleclick Campaign Manager) tags were changed to use gtag.js global site tags. For websites with discrete remarketing and conversion tags hard-coded in the web application, making the change to the new code could be onerous. But for those who had installed their Adwords and Floodlight tags using the templates in their GTM container, no update to any code was necessary because Google made those changes automatically to the GTM tag templates.
Duplicate Google Tag Manager Instances
Figure 2: Notice that both cases (left and right) have the warning that “Multiple installations of Google Tag Manager detected”, but again this is usually only a problem if the same GTM container script is executed twice, not if two separate container scripts each with a unique GTM-ID are present.
UTM Parameter Misuse (and Omission)
Moving away from issues with GA and GTM code implementation, probably the most common data integrity issue related to Google Analytics results from UTM campaign parameter misuse or omission. For those readers unfamiliar with UTM tracking, UTM codes can be manually appended to URL links (for example in an email or in a social media post) and when someone clicks on the tagged link, Google Analytics uses the information embedded in the UTM parameters to record the source(s) of traffic to your website. A more technical summary is that UTM codes are 5 sets of query string key-value pairs that Google Analytics uses to classify website traffic. Admittedly, there is a lot of room for personal preference regarding UTM code usage and naming conventions. Perhaps it would be accurate to suggest that there isn’t any single “correct” way to use UTM codes, but there are demonstrably many ways to misuse them.
A good general rule we recommend is that campaign URL links should always be tagged with source, medium, campaign, and content UTM tags, and appended in that order. We suggest reserving the term (keyword) UTM code for use with paid search only so that keyword reports do not have to be scrubbed of exogenous data.
Best practices, from our experience, should always utilize autotagging when linking Google Ads accounts to Google Analytics except in rare circumstances. Autotagging (available for Google Paid Search and Display Ads) uses the campaign and adgroup information already entered into your Google Ads account to automatically populate UTM data in your GA reports using a gclid query string rather than explicit UTM codes. This requires that you link your Google Analytics and Google Ads accounts in the GA Property settings under “Product Linking”.
Part of what gives UTM codes their utility, and what differentiates them from gclid’s, fbclid’s, and Adobe tagging schema, is that UTM codes are human-readable. It isn’t mandatory that they be human-readable or that they are always used in the order stated above, but in our experience following these conventions decreases campaign tracking errors because it makes the UTM tagging easier to QA. For example, instead of using “utm_campaign=june_2019_grand_opening” you could instead use “utm_campaign=nXt687tRe-413” and just match nXt687tRe-413 to your June ’19 Grand Opening campaign in your data ETL process, but the media planners you work with would probably prefer you just set the UTM code to the former example.
Setting aside personal preferences, the most common UTM code usage mistake we come across is simply failing to set one of the UTM parameters, as seen in Figure 3 below. Because “medium” has not been set in each of the ten rows, GA will categorize all of this paid traffic into the “Other” default grouping channel. Traffic channel misidentification errors can be cleaned up retrospectively in an ETL step, but this is far from ideal. In addition to parameter omission errors, UTM codes are case-sensitive (and spelling sensitive too!), so we recommend using a filter in Google Analytics that sets all UTM codes to lower-case. This prevents GA from separating sessions with utm_campaign=brand19 and utm_campaign=Brand19 into separate pools when they should be grouped together.
Figure 3: Systematic failure to set a value for “utm_medium” has resulted in large amounts of campaign traffic to be classified as “(Other)” in the Google Analytics default channel grouping. Additional data workup will be required to sort these traffic sources into appropriate channels for reporting purposes.
Query String Exclusion and Anchor Link Troubles
Perhaps you have noticed that the same page occurs multiple times in your Landing Page report with different query string variations at the end? Query string inclusion might manifest itself in your reports as a list of single-session pageviews to your homepage with “?fbclid=123abc” included in the page path, where 123abc is the unique click ID that facebook uses to track its users’ on-site behavior. Preferably, you would like all your homepage’s pageviews to be grouped together, which you could then easily sort by source/medium to identify your social media traffic from facebook.
Google Analytics is usually very good about stripping out UTM query strings from page paths, but if you are using other web analytics tools in addition to google analytics (e.g. Hubspot, Adobe, Web Trends, etc.), or are appending facebook click id’s (fbclid’s) as query strings, you will need to manually exclude these in the Exclude URL Query Parameters option in “View Settings”. Failure to exclude query string parameters will result in pageviews (and landing page views) from different sources not being properly aggregated in your GA reports. For example, you could end up with your homepage showing up as “/” but also as “/?WT_abc”, where abc can be any series of Web Trends query string parameters and values. As seen in Figure 4 below, this can result in the homepage showing up hundreds of times with different Web Trends query string values appended! Also, GA has problems if the question mark appears twice in a query string, as in the fourth line in Figure 4.
Figure 4: Shown here are Landing Page Views of the homepage, “/”, with multiple variations of non-UTM query strings causing unnecessary data fragmentation. Because GA by default only removes Google query strings (utm, gclid, etc.), the web trends query strings should be manually excluded as explained above. The data as shown below is fragmented, such that assigning the bounce rate for all sessions landing on the homepage from all traffic sources becomes needlessly onerous.
A similar GA data ingestion problem occurs when placing an anchor link (URL fragment) in a URL preceding the UTM query string. Whereas query strings must start with a question mark (and should only contain a single question mark), anchor links are initiated with a hashtag and should always be appended after any query string. The general rules for URL and URI formatting can be found here.
Not Using Site Search and Google Search Console
The final common problem we encounter is more accurately described as a Google Analytics under-utilization. Two of the most insightful pieces of data we like to include in our clients’ website performance reporting are the search queries users enter into on-site search functionality (if implemented), and the search queries that people use to arrive at the site from Google organic search. These reports are not available from default GA installations but are very easy to initialize. On-site search reports need to be enabled in the “View Settings” options where the query parameter will need to be entered manually; the specific identity of which will vary from site to site. The site-search report will give information about users’ searches on your site and indicate information that visitors want, but might not be able to find easily. Both digital marketing and website design can benefit from this information.
Lastly, arguably some of the best opportunity for website and business optimization comes from data housed in the Google Search Console reports. It is surprising that we run into so many GA accounts that do not have this reporting function enabled and linked to their GA property, as seen below.
Figure 5: Google Search Console requires an additional integration step but is well worth the effort. The data available from GSC and GA integration is some of the most valuable to the online marketer.
One reason that the Goggle Search Console report might be underutilized is that unlike other GA reporting, it requires a verification step that must be performed to access the data. Basically, you need to prove to Google that you own or otherwise have ownership-level permissions on the website in question. The two easiest ways to do this are via the Tag Manager by having user management permissions or by inserting a meta tag in the site’s html in the <head>. Recently, Google released DNS TXT record verification, which may in the future become the favored method of Search Console verification, but we do not yet have enough experience using this method under varying circumstances to recommend it for everyone. The advantage it has is that it should work simultaneously for both http and https versions of the same domain.
The free version of Google Analytics is really quite powerful and has an ROI that can’t be beat, but because it does not offer tech support and there is some confusion over the three different ways it can be implemented (analytics.js, gtag.js, and via GTM), a significant minority of GA web analytics installations that we encounter are either underutilizing the tool or have other more serious issues. The integrity of analytics data necessarily starts with how it is collected, and although some mistakes can be corrected retroactively in an ETL process, other data integrity issues can have serious negative business consequences. From our experience, it is much more cost effective to fix a GA data ingestion mistake before you set yourself up to do any serious analysis.
Thanks for reading and if you are interested in learning more about Optimization Group’s work in website analytics please contact us!
 User Experience; defined as “a person’s perceptions and responses that result from the use or anticipated use of a product, system or service.” (cf. ISO 9241-210:2010)
 The problem we are highlighting here is multiple instantiations of the same GA property ID; it is possible, and sometimes necessary, to correctly install multiple GA properties on the same website.
 This free tool is available here. Google tags will show Green if firing correctly, blue if implementation is non-standard but usually “OK” (GA implemented via the GTM will be blue), yellow if the tag is firing but with a problem, and red if the tag is not working. Details of the tag problems are available by clicking on the tag (right screen above).
 Floodlights are website tags used to send information to Google Campaign Manager and Search Ads 360 (formerly Doubleclick Campaign Manager and Doubleclick Search).
 In the example on the right of Figure 2, efforts (albeit non-exhaustive) to clean the website’s html of the duplicated code were ineffective, and only resulted in deleting the <noscript> iframe from the <body> but the duplicate asynchronous GTM server calls remained in the <head>.
 The “UTM” abbreviation stands for “Urchin Tracking Module” and is named eponymously for the software company Google acquired in 2005; the genesis for Google Analytics as we know it.
 The “keys” here are most often referred to as parameters (or UTM parameters or UTM codes). The five parameters are utm_campaign, utm_source, utm_medium, utm_content, and utm_term. The parameters are set to values with “=” and separated from each other by “&”. More information about using UTM parameters can be found here, and by using the Google’s URL building tool here.
 We have come across some cases where 3rd party integrations require UTM codes to be appended in addition to using the gclid’s (Google Click IDs) from autotagging.
 Export-Transform-Load; general shorthand for data post-processing using various software or cloud solutions (e.g. MySQL, Python etc.). Most Business Intelligence (BI) tools have this capability.
 Even if you are using multiple campaign tracking schema, for example UTM codes and Web Trends, the tracking query string should only contain a single question mark at the beginning. Any additional tracking parameters separate from UTM codes can be added using “&” as parameter separator. For example: www.site.com?utm_source=bing&utm_medium=cpc&utm_campaign=brand&utm_content=sale&WTsrc=bcpc
 An example of what you would need to enter in the query parameter box is “s”. How do you find out what query parameter your site uses? If site-search is used on your website, you should be able to find the search query string appended to page URIs in your All Pages Report that look like this: “/page_path?s=Search+Terms”. The “?s=” indicates that your query parameter is the letter “s”. Alternatively, you can involve your web development resources and they should be able to inform you how site-search has been implemented on your website.
 Return on Investment