Skip to main content

GOV.UK GA4 data quality

Google Analytics 4 (GA4) is used to collect data on the usage of GOV.UK. Information on how to understand and use GOV.UK GA4 data can be found in the Analysis section of this site.

There are a few known issues with this data, which are detailed below.

Data quality notes and annotations

The GOV.UK GA4 data annotations Looker Studio report contains notes on issues with the GOV.UK GA4 data collection and outside events that may impact GOV.UK usage.

Data quality notes can be added by GDS staff using the GOV.UK GA4 annotations form.

Known issues with the GOV.UK GA4 data

Some larger scale issues which impact our data collection are not detailed in the above annotations reporting. Shorter term bugs and issues with our tracking can be found in the above Looker Studio.

Data quality variance and content over time

Data was first collected into this property on 23/09/22. The events captured have changed significantly over time, and so early data quality may be patchy.

The GOV.UK GA4 data annotations Looker Studio report contains information on when GOV.UK GA4 events were created.

Bot traffic

In GA4, traffic from known bots is automatically excluded.

We do not know how much other bot traffic is being recorded in our data.

Smokey tests

Some smoke tests are run on Production GOV.UK, and sometimes the smokey test bot triggers analytics data collection.

These can be found and removed from the data by using the ‘Test data filter name’ dimension, which will be populated with ‘Smokey’ if the data is from a smokey test IP address. This constitutes a tiny amount of data, so is unlikely to impact analysis.

Users we miss

We know a chunk of our users do not accept analytics cookies so we do not collect any data from them.

Some users may also be using ad blockers which inhibit Google Analytics from functioning.

Incorrect event tracking

Assumptions in navigation tracking

In order to capture users clicking on links in ways other than the usual left or primary click, we set up navigation (link) tracking so that all clicks on links trigger a navigation event to fire.

This may mean some false positives in the data, as some right (secondary) clicks or other clicks may not necessarily lead to the user navigating through the link in question. For example, a user might be right clicking on a link to copy it or inspect it. If data users would like to exclude these clicks from their dataset, they can do so using the ‘method’ custom dimension.

This is also why some duplicate tracking may be occuring with the navigation and copy tracking, as users who right click and select to ‘Copy’ a link will trigger both navigation and copy events.

‘Error’ in form_complete tracking

form_complete events will fire on any view of the ‘results’ page. This means that multiple form_complete events can be triggered by the same user if they refresh the page or navigate away from it and back to it. Also, it is possible that some users could trigger a form_complete event without having actually responded to or submitted the form themselves, if they are sent the link to a results page. This is because we were not technically able to limit form_complete events to firing conditionally depending on users’ activity on other pages.

Further information on how and when form events are used can be found in our information on the GOV.UK GA4 data structure.

Truncated ecommerce (view_item_list) tracking on search results pages

Each ecommerce event (view_item_list) can have up to 200 items sent with it. When searches return more than 200 items, the items sent with each event will be truncated.

In some cases, we have had to limit the number of items sent with each event further, as the number of bytes being sent with each event was too high and was triggering error 413 (Payload Too Large). The maximum amount of bytes we can send to GA4 per event appears to be 16KB, so we have implemented a limit to cut off the ecommerce items array at 15KB, to ensure our events are small enough to send (the additional KB may be needed for other information sent with the event).

Incorrect information in dimensions

Parameter character count limits

GA4 limits all parameters (custom dimensions) to a maximum length of 500 characters (for 360 properties).

The only exception to this is the page_location parameter which must be 1,000 characters or fewer.

Inconsistencies in ‘outbound’ values in attachment events

On pages with attachment links, clicks on different links to download files come through with the outbound dimensions equalling ‘true’ as would be expected (as these files are hosted on https://assets.publishing.service.gov.uk). However, clicks on the preview link (to ‘View online’) come through with the outbound value of ‘false’ even through the preview is also hosted on https://assets.publishing.service.gov.uk.

This is because in the source HTML the second link only has the page path, and is being redirected to the assets domain, so it incorrectly appears to our tracking code as if the file is hosted on www.gov.uk.

Issues with users accessing GOV.UK in different languages

We capture what language a page was written in in the content_language dimension. The majority of pages on GOV.UK are written in English, although there are a few pages in Ukrainian, Russian, and other languages.

Separately, the user’s browser language is captured in the language dimension. However, there are a number of ways users can translate page content - for example, using browser add-ons. If the browser is translating the page content after the page_view event is sent, then the page_view will be sent with details in the original language (in most cases, English), though the text and other dimensions sent with subsequent interactions on that page might be translated.

There are a few dimensions that we have hard-coded in English to make them easier to analyse, for example the ‘section’ value on related content link clicks, but in most cases this was not possible due to the way content is surfaced via the Content API on GOV.UK.

When there is an extra / in the URL, the ‘Link domain’ information is incorrect, coming through as the first part of the path instead of the domain. This can be seen on the live site, for example if you interact with the Contents links on the https://www.gov.uk//guidance/cost-of-living-payment#low-income-benefits-and-tax-credits-cost-of-living-payment-eligibility page. Strictly, this URL should not be valid, but it (and many other incorrect URLs) do work to load content on GOV.UK.

Use of URLs like this is rare and so this should not cause too much of a data quality issue.

Issues with publication and update dates

The first_published_at, public_updated_at and updated_at dates sent with page view events may be misleading.

This is particularly likely to be the case for content items published between 11pm and 1am (an hour either side of midnight) depending on whether the item is published during Greenwich Mean Time (GMT) or British Summer Time (BST). This is because to extract the date, which we record in the custom dimension, we strip out the time information from the Content API timestamp to leave the date in YYYY-MM-DD format.

Previous work looking into timestamps associated with Whitehall Publisher CSVs identified that the Content API timestamps are in GMT, so for example an item timestamped as ‘2014-08-31 23:00:00’ is actually displayed on the page as ‘1 September 2014’ (published at midnight BST).

We have not yet investigated whether this has an impact on these GA4 dimensions.

This page was last reviewed on 28 November 2024. It needs to be reviewed again on 28 May 2025 .
This page was set to be reviewed before 28 May 2025. This might mean the content is out of date.