GOV.UK GA4 (BigQuery) backfill processes
Each morning we receive yesterday’s GOV.UK GA4 data in BigQuery.
The sending of this data is administered by Google, and on occasion the data can arrive relatively late in the day. When this happens, several processes we use to transform the data fail, and the transformed data needs to be backfilled when it does arrive.
This page aims to summarise the backfilling processes undertaken by the GOV.UK Insights and Analytics team when GOV.UK GA4 data arrives late.
Flattened sharded GA4 data
To backfill the flattened sharded GA4 data, you need to:
- First, delete the affected empty sharded table from the GA4 flattened dataset
- Run the flattened sharded backfill SQL query, making sure to change the date at the top of the query
- Once the query has run, view the results and save them to a BigQuery table. Save them to the
ga4-analytics-352613.flattened_dataset
dataset. The name of the backfilled table should be ‘flattened_daily_ga_data_YYYYMMDD’, with YYYYMMDD corresponding to the date of your backfilled data, e.g. ‘flattened_daily_ga_data_20240902’
Pogo-sticking dashboard
The data for the pogo-sticking dashboard relies on the flattened GOV.UK GA4 data arriving on time. If this is late, then it will need to be backfilled. Only steps 8, 9 and 10 need to be backfilled for the dashboard to function.
The process for backfilling the data for this dashboard is as follows:
- First, delete the affected empty tables from steps 8, 9 and 10 in the
gds-bq-data.GA4_PogoSticking
dataset - Run the below queries (changing to the correct date at the beginning) and save the results with the correct table names to their corresponding tables in the aforementioned dataset. For example, if you were backfilling the data for step 9 for the 2nd of September 2024, you would run the backfill query and save the results to the ‘gds-bq-data’ project, the ‘GA4_PogoSticking’ dataset, and your table name would be ‘PogoStickStepNine_20240902’
Partitioned tables
The two partitioned tables built from GOV.UK GA4 data have been set up to automatically backfill, even if the raw data arrives late.
More information on how these tables are generated can be read on the GOV.UK GA4 BigQuery data processing page, and the queries used can be viewed in Github.