GA4 Data Processing
The raw data we get from Google Analytics is processed to make it more useful for our needs. Specifically we transform the data from sharded tables to a partitioned table and also flatten the data from it’s initial nested structure.
The processing takes place in the Google Cloud Project https://console.cloud.google.com/welcome?project=gds-bq-reporting
using Dataform.
A link to the specific Dataform repository is here: https://console.cloud.google.com/bigquery/dataform/locations/europe-west2/repositories/ga4_production_partitioning/details/workspaces?project=gds-bq-reporting
Sharded to Partitioned tables
This process takes a date sharded table (ending YYYYMMDD) and copies it as a new partition into ga4-analytics-352613.analytics_330577055.partitioned_events
.
The partitioning is done on event_date
which means that this field is transformed from a string into a timestamp. The table is also clustered on event_name
Flattened table creation
The flattened GA4 data is created from the partitioned table and it is also partitioned on event_date
and clustered on event_name
and is saved as ga4-analytics-352613.flattened_dataset.partitioned_flattened_events
Re-running the pipeline
To perform a manual execution click on the start execution button here https://console.cloud.google.com/bigquery/dataform/locations/europe-west2/repositories/ga4_production_partitioning/details/release-scheduling?project=gds-bq-reporting and select the Production release configuration and all actions to run.
Updating the Partitioned table schema
Occasionally Google will change the schema of the data it sends to BigQuery. This will cause the processing to fail. To update the schema run the following:
This command produces a schema file based on what Google is now sending, change YYYYMMDD to reflect yesterday
bq show schema format=prettyjson ga4-analytics-352613:analytics_330577055.events_YYYYMMDD > new_schema.json
You will then need to change the event_date
to a DATE
and save the modified file.
bq update ga4-analytics-352613:analytics_330577055.partitioned_events new_schema.json
Then see the steps on re running the pipeline above.