BigQuery best practice
This page is a work in progress.
GA4 data is available in BigQuery in the raw dataset or the flattened dataset, which is stored in a partitioned table.
The raw and flattened GA4 data in BigQuery is stored as a table of events - each row, or record, represents an event. Both the raw and flattened GA4 data are stored in partitioned tables, which are more efficient to query than sharded tables. A count of all records on a day will give you the total number of events that were recorded on that day. A count of all records on a day filtered to only show records with the event name ‘page_view’ will give you the total number of page views recorded on that day. Metrics such as users and sessions need to be calculated. Some example SQL to find common GA4 metrics can be found on the ‘Find things in GA4’ page.
Querying GA4 data in BigQuery
If you are unfamiliar with BigQuery, it may help to review Google’s documentation explaining the BigQuery user interface.
To query GA4 data in BigQuery you will have to have permissions to view the data and to run queries in whichever project you are running the query in. If your query fails to run, check whether you are running the query from the right project, and check the error message to see if there was a role or permission error.
You can save queries in BigQuery to return to later.
There are quotas in place on querying in BigQuery to ensure that costs do not get too high. More information on these can be found in the quotas guidance. Details of the specific quotas set on various projects can be found under each project on the GCP projects page.
Best practice
Avoid selecting all (SELECT *
) - there are very few circumstances where you actually need every column from the data source!
If you would like to see what is in the dataset, you can PREVIEW a table in the BigQuery interface.
Where possible, use the flattened data source. The flattened tables are much more efficient to query, and should be easier to use as well.
If you are running queries in BigQuery or connecting to BigQuery data, make sure to define your date range. Using a wildcard in the place of the date at the end of the table queries all the flattened tables (the entire history of the data) and costs can rack up pretty quickly due to the amount of data we are collecting and storing. Either specify a date in the tables you are selecting data from or make sure to use a WHERE statement where you define the date (or dates) you want.
Note that using a LIMIT statement does not reduce the amount of data queried, just the amount of rows returned to you. SELECT * FROM [table] LIMIT 20
and SELECT * FROM [table]
cost the exact same amount.
Selecting dates best practice (partitioned tables)
The GOV.UK GA4 flattened data is stored in a partitioned table, which introduces some differences in how queries are structured compared to the previous method of working with sharded tables.
Querying a specific date
The data, partitioned on event_date
, requires a WHERE
clause like WHERE event_date = "2024-09-12"
Querying yesterday’s data
To select yesterday’s data, the WHERE
clause would be written as: WHERE event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
Querying dates using a DATE
variable
Finally, if your preference is to select dates using declared variables, these would now be written as follows:
DECLARE backfill_date DATE DEFAULT "2024-10-20";
SELECT *
FROM `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE event_date = backfill_date
Using GA4 data stored in BigQuery in other tools
The GA4 data stored in BigQuery can also be queried into visualisation tools or other products built to use this data.
Looker Studio connects very easily to data stored in BigQuery, and so is often used to display BigQuery data within GDS. More guidance on this can be found on the Looker Studio best practice page.