Skip to main content

GOV.UK GA4 (BigQuery export)

We export our GOV.UK Google Analytics data and store it in Google BigQuery to enable more detailed analysis and the use of this data in other tools.

A flattened dataset is created from the live site data. This contains most, but not all, of the same fields as the raw data. The flattened dataset is easier and more efficient to query, so should be used for most analysis and reporting.

Access

Everyone with a @digital.cabinet-office.gov.uk account (all GDS staff) has access to this data by default.

In addition, everyone who requests and is granted access to the GOV.UK GA4 data will be given access to query this raw dataset. More information can be found in our GA access policy.

Location

There are 3 GA4 datasets. These correspond to the integration, staging, and production or live GOV.UK websites. The GA4 data for the live GOV.UK site is located in BigQuery in the ga4-analytics-352613.analytics_330577055 dataset. The GA4 data for the staging site is located in BigQuery in the ga4-analytics-352613.analytics_330580593 dataset. The GA4 data for the integration site is located in BigQuery in the ga4-analytics-352613.analytics_294475112 dataset.

These datasets are all comprised of date sharded tables - a new table is created each day with the suffix YYYYMMDD.

Our Google Analytics properties export GOV.UK data several times a day. The data for the current day is temporarily stored in intraday tables. At the end of the day, BigQuery automatically moves the data in the intraday tables to a date table (suffixed YYYYMMDD) and deletes the intraday tables in question. New intraday tables are created and added to throughout the next day.

The GA4 datasets are all stored within the GA4 analytics project which is linked from here along with our other GCP projects. For more information on the Google Cloud Platform projects, see our GCP Documentation.

Schema

This table uses the default GA4 BigQuery Export schema:

field name type mode
event_date STRING NULLABLE
event_timestamp INTEGER NULLABLE
event_name STRING NULLABLE
event_params RECORD REPEATED
event_previous_timestamp INTEGER NULLABLE
event_value_in_usd FLOAT NULLABLE
event_bundle_sequence_id INTEGER NULLABLE
event_server_timestamp_offset INTEGER NULLABLE
user_id STRING NULLABLE
user_pseudo_id STRING NULLABLE
privacy_info RECORD NULLABLE
user_properties RECORD REPEATED
user_first_touch_timestamp INTEGER NULLABLE
user_ltv RECORD NULLABLE
device RECORD NULLABLE
geo RECORD NULLABLE
app_info RECORD NULLABLE
traffic_source RECORD NULLABLE
stream_id STRING NULLABLE
platform STRING NULLABLE
event_dimensions RECORD NULLABLE
ecommerce RECORD NULLABLE
items RECORD REPEATED
collected_traffic_source RECORD NULLABLE
is_active_user BOOLEAN NULLABLE

Retention

The integration and staging GA4 datasets have a default table expiry in BigQuery of 30 days.

Processing

The raw data recieved from Google is sharded into daily tables. We process these into a partitioned table that is partitioned on event_date and clustered on event_name, we then flatten this into a partitioned flattened table.

This processing occurs within DataForm in the gds-bq-reporting project and writes data back to the ga4-analytics-352613 project. The DataForm is compiled at 7am UTC and ran at 8am UTC.

graph TD 
A[Raw sharded data from Google] --> B
B[Partitioned raw event data] 
B --> C[Partitioned flattened data] 

This page was last reviewed on 23 May 2024. It needs to be reviewed again on 23 November 2024 .
This page was set to be reviewed before 23 November 2024. This might mean the content is out of date.