Skip to main content

GA4 Data Processing

The raw data we get from Google Analytics is processed to make it more useful for our needs. Specifically we transform the data from sharded tables to a partitioned table and also flatten the data from it’s initial nested structure.

The processing takes place in the Google Cloud Project https://console.cloud.google.com/welcome?project=gds-bq-reporting using Dataform.

A link to the specific Dataform repository is here: https://console.cloud.google.com/bigquery/dataform/locations/europe-west2/repositories/ga4_production_partitioning/details/workspaces?project=gds-bq-reporting

Sharded to Partitioned tables

This process takes a date sharded table (ending YYYYMMDD) and copies it as a new partition into ga4-analytics-352613.analytics_330577055.partitioned_events. The partitioning is done on event_date which means that this field is transformed from a string into a timestamp. The table is also clustered on event_name

Flattened table creation

The flattened GA4 data is created from the partitioned table and it is also partitioned on event_date and clustered on event_name and is saved as ga4-analytics-352613.flattened_dataset.partitioned_flattened_events

Re-running the pipeline

If the raw data arrives after 8am UTC then data will be missing until the pipeline has been manually run. To perfrom a manual execution click on the start execution button here https://console.cloud.google.com/bigquery/dataform/locations/europe-west2/repositories/ga4_production_partitioning/details/release-scheduling?project=gds-bq-reporting and select the Production release configuration and all actions to run.

This page was last reviewed on 30 July 2024. It needs to be reviewed again on 30 January 2025 .
This page was set to be reviewed before 30 January 2025. This might mean the content is out of date.