Skip to main content

GOV.UK GA4 improvements changelog

This page details improvements made to the GOV.UK GA4 data collection and processing since January 2025.

Upcoming changes can be found on the improvements roadmap page.

For a more long-term history of changes to the GOV.UK GA4 data collection, please see the GOV.UK GA4 annotations report.

January 2025

New canonical URL field

A page’s canonical URL is captured in a new field/custom dimension sent with all events. This is available in the GOV.UK GA4 property, the Data API, and in BigQuery in both the raw and flattened datasets.

traffic_type parameter added to the GOV.UK GA4 flattened dataset

The traffic_type parameter is available in the GOV.UK GA4 flattened dataset to support analysis of GOV.UK GA4 data filters.

Resolved issues with incomplete data in some custom dimension fields in the GOV.UK GA4 flattened dataset

Issues with the query_string, ui_text, response, search_term, autocomplete_input, autocomplete_suggestions, and link_text fields in the GOV.UK GA4 flattened dataset have been resolved in the data processing going forwards. The processing for these fields was only including string values, but we observed that some data was coming through as having an integer or double data type. We have now edited the processing so that integer and double type data are included in these fields in the flattened table.

Taxonomy dimensions renamed in the GOV.UK GA4 flattened dataset

Four taxonomy dimensions have been renamed in the GOV.UK GA4 flattened dataset to help users find the correct fields. taxonomy_all_DEPRECATED has become taxonomy_all and taxonomy_all_ids_DEPRECATED has become taxonomy_all_ids, as these are the fields containing current taxonomy information. full_taxonomy has been renamed full_taxonomy_DEPRECATED and full_taxonomy_ids has been renamed full_taxonomy_ids_DEPRECATED because these fields have not been populated since November 2023.

Smokey test data filter updated with other test IP addresses

The ‘Smokey’ data filter set up in the GOV.UK GA4 property has been updated with the IP addresses now used for the E2E tests. The GOV.UK GA4 data quality notes contain further details on the potential test bot data being collected and notes on how to filter it out of reports.

February 2025

Bug fixed on contact form_complete events

A bug which led to contact type form_complete events firing on the completion of other forms across GOV.UK has now been fixed.

Page path (cleaned_page_location) field updated in the GOV.UK GA4 flattened dataset

The cleaned_page_location field in the GOV.UK GA4 flattened dataset has been updated following the implementation of the canonical URL. The cleaned_page_location now defaults to the canonical_url value, with the hostname removed, if it is available. If the canonical_url is null, the cleaned_page_location is calculated as before - by stripping the hostname, anchors and query string from the page_location.

Alternative methods of backfilling datasets investigated

We have investigated changing our backfilling processes, focussing on developing a process for backfilling individual columns of the GOV.UK GA4 flattened dataset. In this investigation, we learned how to backfill individual columns, but also that this was a fiddly and expensive task. We will only use this single column method on rare occasions, and will instead continue to prefer whole day backfilling, with human intervention required to delete the days needing to be backfilled.

Assets domain Search Console BigQuery export set up

We have set up the bulk export for the assets.publishing.service.gov.uk domain Search Console property. This should facilitate analysis of users’ journeys to files on that domain.

March 2025

Redaction added to target National Insurance numbers

Some users were spotted searching for their National Insurance numbers. We have added in additional redaction to ensure this data does not get collected in any search analytics data collection, and are deleting any collected National Insurance numbers that may have previously slipped through.

April 2025

Changed processing of the content_language field in the GOV.UK GA4 flattened dataset

We have noticed unexpected values appearing in the content_language field - for example, where the language changes between events on the same page, or multiple times in one user’s session. Previously, we were applying some processing to the content_language field to keep one consistent value across one user’s events on a given page, but we are no longer sure if that is the best approach and so we have removed this calculation. The values in the field now simply reflect the values collected with each event, to help us debug these unexpected values.

Updated processing of the GOV.UK GA4 flattened dataset to use the custom timestamp

We have added the custom timestamp field to window functions previously only using the event_timestamp. From the dates we tested this change with, we do not expect this to cause any difference in the values output - this is just being added as a fail-safe.

Resolved issues with some sessionised custom dimension fields in the GOV.UK GA4 flattened dataset

Issues were identified with fields which use window functions to generate session-scoped values in the GOV.UK GA4 flattened dataset. The processing for the affected fields involved window functions which generated the output values based on the client ID, session ID and (in some cases) page location. However, in these functions, the ga_session_id was not being correctly selected (the string value was being extracted instead of the integer value), and so data could be incorrect for users who had multiple sesssions in one day. This impacted the content_language, session_source, session_medium, public_updated_at, updated_at, content_id, browse_topic, publishing_app, first_published_at, taxonomy_all_ids, status_code, withdrawn, document_type, history, taxonomy_main_id, taxonomy_all, taxonomy_main, taxonomy_level_1, full_taxonomy_DEPRECATED, full_taxonomy_ids_DEPRECATED, rendering_app, organisations, schema_name, term, content, and campaign_id fields. In most cases the difference noticed when fixing the functions was minor (<0.05%), but session_source and session_medium were more significantly impacted (about 8% out). We have now edited the processing to fix these fields in the flattened table.

April 2025

GOV.UK GA4 flattened dataset refreshed

We have refreshed the entirety of the GOV.UK GA4 flattened dataset so that all processing improvements made over previous months are applied to the whole dataset.

This page was last reviewed on 22 April 2025. It needs to be reviewed again on 22 October 2025 .