GOV.UK GA4 (BigQuery export)
We export our GOV.UK Google Analytics data and store it in Google BigQuery to enable more detailed analysis than is possible in the GA4 user interface. This data can also be queried and used in other tools.
A flattened dataset is created from the raw production GOV.UK data. This contains most, but not all, of the same fields as the raw data. The flattened dataset is easier and more efficient to query, so should be used for most analysis and reporting.
Access
Everyone with a @digital.cabinet-office.gov.uk account (all GDS staff) has access to this data by default.
In addition, everyone who requests and is granted access to the GOV.UK GA4 data will be given access to query this dataset. More information can be found in our GA access policy.
Location
There are 3 raw GOV.UK GA4 datasets, which correspond to the integration, staging, and production GOV.UK websites. All of these datasets are in the GA4 Analytics project. These datasets are:
- the
ga4-analytics-352613.analytics_330577055
dataset, which holds raw GA4 data for the GOV.UK production site - the
ga4-analytics-352613.analytics_330580593
dataset, which holds raw GA4 data for the GOV.UK staging site - the
ga4-analytics-352613.analytics_294475112
dataset, which holds raw GA4 data for the GOV.UK integration site
These datasets are all comprised of date sharded tables - a new table is created each day with the suffix YYYYMMDD.
Set-up
Data collection and processing
Our Google Analytics properties export GOV.UK data several times a day.
The data for the current day is temporarily stored in intraday tables.
At the end of the day, BigQuery automatically moves the data in the intraday tables to a date table (suffixed YYYYMMDD
) and deletes the intraday tables in question.
New intraday tables are created and added to throughout the next day.
Schema
These tables use the default GA4 BigQuery Export schema:
field name | type | mode |
---|---|---|
event_date | STRING | NULLABLE |
event_timestamp | INTEGER | NULLABLE |
event_name | STRING | NULLABLE |
event_params | RECORD | REPEATED |
event_previous_timestamp | INTEGER | NULLABLE |
event_value_in_usd | FLOAT | NULLABLE |
event_bundle_sequence_id | INTEGER | NULLABLE |
event_server_timestamp_offset | INTEGER | NULLABLE |
user_id | STRING | NULLABLE |
user_pseudo_id | STRING | NULLABLE |
privacy_info | RECORD | NULLABLE |
user_properties | RECORD | REPEATED |
user_first_touch_timestamp | INTEGER | NULLABLE |
user_ltv | RECORD | NULLABLE |
device | RECORD | NULLABLE |
geo | RECORD | NULLABLE |
app_info | RECORD | NULLABLE |
traffic_source | RECORD | NULLABLE |
stream_id | STRING | NULLABLE |
platform | STRING | NULLABLE |
event_dimensions | RECORD | NULLABLE |
ecommerce | RECORD | NULLABLE |
items | RECORD | REPEATED |
collected_traffic_source | RECORD | NULLABLE |
is_active_user | BOOLEAN | NULLABLE |
Retention
The integration and staging GA4 datasets have a default table expiry in BigQuery of 30 days. The production dataset does not currently have a set table expiry.