BigQuery best practice
GA4 data is available in BigQuery in the raw dataset or the flattened dataset, which is stored in a partitioned table.
The raw and flattened GA4 data in BigQuery is stored as a table of events - each row, or record, represents an event. Both the raw and flattened GA4 data are stored in partitioned tables, which are more efficient to query than sharded tables. A count of all records on a day will give you the total number of events that were recorded on that day. A count of all records on a day filtered to only show records with the event name ‘page_view’ will give you the total number of page views recorded on that day. Metrics such as users and sessions need to be calculated. Some example SQL to find common GA4 metrics can be found on the ‘Find things in GA4’ page.
Querying GA4 data in BigQuery
If you are unfamiliar with BigQuery, it may help to review Google’s documentation explaining the BigQuery user interface.
To query GA4 data in BigQuery you will have to have permissions to view the data and to run queries in whichever project you are running the query in. If your query fails to run, check whether you are running the query from the right project, and check the error message to see if there was a role or permission error.
You can save queries in BigQuery to return to later.
There are quotas in place on querying in BigQuery to ensure that costs do not get too high. More information on these can be found in the quotas guidance. Details of the specific quotas set on various projects can be found under each project on the GCP projects page.
Best practice
Do not SELECT *
Avoid selecting all (SELECT *
) - there are very few circumstances where you actually need every column from the data source!
If you would like to see what is in the dataset, you can PREVIEW a table in the BigQuery interface.
Use flattened data sources
Where possible, use the flattened data source. The flattened tables are much more efficient to query, and should be easier to use as well.
Always define date ranges
If you are running queries in BigQuery or connecting to BigQuery data, make sure to define your date range.
It is best to test queries on short periods of time, such as one day, whilst still in development, to ensure you are not querying too much data unnecessarily. Costs can rack up quickly when querying long periods due to the amount of data we are collecting and storing.
Our partitioned tables, such as the GOV.UK GA4 flattened data, have been set up to require a WHERE
clause specifying the dates being queried.
To query a specific date, the WHERE
clause would look like: WHERE event_date = "2024-09-12"
To select yesterday’s data, the WHERE
clause would be written as: WHERE event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
If your preference is to select dates using declared variables, these would be written as follows:
DECLARE backfill_date DATE DEFAULT "2024-10-20";
SELECT *
FROM `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE event_date = backfill_date
When querying sharded tables, such as the raw GA4 data, you need to make sure that you either:
- specify a date or date range in the name of the table you are selecting data from, or
- use a
WHERE
statement in which you define the date (or dates) you want.
Using a wildcard in the place of the date at the end of a sharded table name queries every shard of the table (the entire history of the data).
A note on using LIMIT
Using a LIMIT
statement does not reduce the amount of data queried, just the number of rows returned to you. SELECT * FROM [table] LIMIT 20
and SELECT * FROM [table]
cost the exact same amount.
Using GA4 data stored in BigQuery in other tools
The GA4 data stored in BigQuery can also be queried into visualisation tools or other products built to use this data.
Using BigQuery data in Looker Studio
Looker Studio connects very easily to data stored in BigQuery, and so is often used to display BigQuery data within GDS. More guidance on this can be found on the Looker Studio best practice page.
Using the BigQuery ‘Connected Sheets’ feature
GA4 data stored in BigQuery can be accessed in Google Sheets via the Connected Sheets feature.
This can be an efficient way to query the data if you set up a custom query (similar to how you would in Looker Studio), and filter the partitioned flattened table by at least date and event name, rather than using the default method that starts by connecting to the whole table.