Skip to main content

BigQuery best practice

GA4 data is available in BigQuery in the raw dataset or the flattened dataset, which is stored in a partitioned table.

The raw and flattened GA4 data in BigQuery is stored as a table of events - each row, or record, represents an event. Both the raw and flattened GA4 data are stored in partitioned tables, which are more efficient to query than sharded tables. A count of all records on a day will give you the total number of events that were recorded on that day. A count of all records on a day filtered to only show records with the event name ‘page_view’ will give you the total number of page views recorded on that day. Metrics such as users and sessions need to be calculated. Some example SQL to find common GA4 metrics can be found on the ‘Find things in GA4’ page.

Querying GA4 data in BigQuery

If you are unfamiliar with BigQuery, it may help to review Google’s documentation explaining the BigQuery user interface.

To query GA4 data in BigQuery you will have to have permissions to view the data and to run queries in whichever project you are running the query in. If your query fails to run, check whether you are running the query from the right project, and check the error message to see if there was a role or permission error.

You can save queries in BigQuery to return to later.

There are quotas in place on querying in BigQuery to ensure that costs do not get too high. More information on these can be found in the quotas guidance. Details of the specific quotas set on various projects can be found under each project on the GCP projects page.

Best practice

Do not SELECT *

Avoid selecting all (SELECT *) - there are very few circumstances where you actually need every column from the data source! If you would like to see what is in the dataset, you can PREVIEW a table in the BigQuery interface.

Use flattened data sources

Where possible, use the flattened data source. The flattened tables are much more efficient to query, and should be easier to use as well.

Always define date ranges

If you are running queries in BigQuery or connecting to BigQuery data, make sure to define your date range. Using a wildcard in the place of the date at the end of the table queries all the flattened tables (the entire history of the data) and costs can rack up pretty quickly due to the amount of data we are collecting and storing. Either specify a date in the tables you are selecting data from or make sure to use a WHERE statement where you define the date (or dates) you want.

Our partitioned tables, such as the GOV.UK GA4 flattened data, have been set up to require a WHERE clause specifying the dates being queried.

To query a specific date, the WHERE clause would look like: WHERE event_date = "2024-09-12"

To select yesterday’s data, the WHERE clause would be written as: WHERE event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)

If your preference is to select dates using declared variables, these would be written as follows:

DECLARE backfill_date DATE DEFAULT "2024-10-20";
SELECT *
FROM `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE event_date = backfill_date

A note on LIMIT

Note that using a LIMIT statement does not reduce the amount of data queried, just the amount of rows returned to you. SELECT * FROM [table] LIMIT 20 and SELECT * FROM [table] cost the exact same amount.

Using GA4 data stored in BigQuery in other tools

The GA4 data stored in BigQuery can also be queried into visualisation tools or other products built to use this data.

Using BigQuery data in Looker Studio

Looker Studio connects very easily to data stored in BigQuery, and so is often used to display BigQuery data within GDS. More guidance on this can be found on the Looker Studio best practice page.

Using the BigQuery ‘Connected Sheets’ feature

GA4 data stored in BigQuery can be accessed in Google Sheets via the Connected Sheets feature.

This can be an efficient way to query the data if you set up a custom query (similar to how you would in Looker Studio), and filter the partitioned flattened table by at least date and event name, rather than using the default method that starts by connecting to the whole table.

This page was last reviewed on 13 December 2024. It needs to be reviewed again on 13 June 2025 .
This page was set to be reviewed before 13 June 2025. This might mean the content is out of date.