Skip to main content

GA4 Data Processing

The raw data we get from Google Analytics is processed to make it more useful for our needs. Specifically we transform the data from sharded tables to a partitioned table and also flatten the data from it’s initial nested structure.

The processing takes place in the Google Cloud Project https://console.cloud.google.com/welcome?project=gds-bq-reporting using Dataform.

A link to the specific Dataform repository is here: https://console.cloud.google.com/bigquery/dataform/locations/europe-west2/repositories/ga4_production_partitioning/details/workspaces?project=gds-bq-reporting

Sharded to Partitioned tables

This process takes a date sharded table (ending YYYYMMDD) and copies it as a new partition into ga4-analytics-352613.analytics_330577055.partitioned_events. The partitioning is done on event_date which means that this field is transformed from a string into a timestamp. The table is also clustered on event_name

Flattened table creation

The flattened GA4 data is created from the partitioned table and it is also partitioned on event_date and clustered on event_name and is saved as ga4-analytics-352613.flattened_dataset.partitioned_flattened_events

Re-running the pipeline

To perform a manual execution click on the start execution button here https://console.cloud.google.com/bigquery/dataform/locations/europe-west2/repositories/ga4_production_partitioning/details/release-scheduling?project=gds-bq-reporting and select the Production release configuration and all actions to run.

Updating the Partitioned table schema

Occasionally Google will change the schema of the data it sends to BigQuery. This will cause the processing to fail. To update the schema run the following:

This command produces a schema file based on what Google is now sending, change YYYYMMDD to reflect yesterday

bq show schema format=prettyjson ga4-analytics-352613:analytics_330577055.events_YYYYMMDD > new_schema.json

You will then need to change the event_date to a DATE and save the modified file.

bq update ga4-analytics-352613:analytics_330577055.partitioned_events new_schema.json

Then see the steps on re running the pipeline above.

This page was last reviewed on 17 September 2024. It needs to be reviewed again on 17 March 2025 .
This page was set to be reviewed before 17 March 2025. This might mean the content is out of date.