Looker Studio best practice
This page is a work in progress.
You can visualise GOV.UK GA4 data in Looker Studio via the standard ‘Google Analytics’ connector or by connecting to the GA4 data stored in BigQuery.
Using the GA4 (Data API) Looker Studio connector
You can easily connect to the GA4 data in Looker Studio using the ‘Google Analytics’ connector which is built and supported by Looker Studio. This connects to the GA4 data using the Google Analytics Data API.
If you are a GDS staff member, please use the shared data connections already set up. This is set up with the correct settings, and using this connection ensures that reports will not break when the creator leaves GDS.
Further guidance on Looker Studio data connections can be found on the Looker Studio tools page.
Best practice
Cardinality and sampling
Watch out for the impact of high cardinality in your reports. Pay attention to the (other) row and the totals of tables when filtering. Data in (other) can be inadvertently filtered out and result in misleading tables or visualisations.
If you are trying to figure out whether a certain chart will be impacted by cardinality or sampling, you can use the Query Explorer.
If you input the dimensions, metrics, and filters you are using in the Query Explorer and scroll to the bottom of the JSON response tab, you will be able to see the dataLossFromOtherRow
or samplingMetadatas
.
This can be useful when you are trying to work out what’s happening in a Looker Studio chart that doesn’t look quite right, or which dimensions will give better results.
If your data appears to be impacted in a highly detrimental way by cardinality, you can force it to use sampling instead by adding an extra dimension or two. These dimensions don’t need to be visible in the chart itself - if you filter on a dimension which makes no difference to your data, it still changes which backend tables GA4 uses to process the query. Neither option is ideal, but a 7-day period sampled at about 75% is a lot more meaningful than some nonsensical metrics for any time period. Alternatively, if you want 100% accuracy, try selecting 5 days including at least a Saturday or a Sunday so it includes some representative weekend behaviour.
Regrettably, Looker Studio currently does not indicate if data from Google Analytics is sampled.
Be particularly careful when using same-day data. GA4 data on GOV.UK takes several hours to be processed, so same-day data is often very unreliable. It is usually better to set your date range to exclude today’s data, unless there is a specific reason you need to look at today.
Quotas
Be aware that Looker Studio reports that connect to Google Analytics 4 data are subject to Google Analytics Data API quotas. You can use the ‘Google Analytics token usage’ information (which you can find in the menu when you right click whilst editing a dashboard) to assess how many tokens you are using. You can also use the ‘pause updates’ feature whilst creating new charts or carrying out a lot of edits on a report to minimise the number of tokens used.
Use the BigQuery GA4 data in Looker Studio
The GA4 data stored in BigQuery can be used in Looker Studio.
If you are a member of GDS staff, please use the shared data connections.
If you are not GDS staff, we recommend you use a custom query. You can do this by:
- Clicking to add a data source
- In the ‘Connect to data’ panel, selecting ‘BigQuery’
- In the left-hand menu, selecting ‘Custom query’
- Entering your billing project ID
- Entering a query to select the fields needed using a WHERE statement to define the date range required
- Tick the checkbox to ‘enable date range parameters’ if needed
- Selecting ‘Add’
An example of a SQL query that could be used in step 5 above is:
Partitioned data source (the go-to GA4 connection)
SELECT *
FROM `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE event_date >= PARSE_DATE('%Y%m%d', @DS_START_DATE) and event_date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
Sharded data source (our legacy GA4 connection)
SELECT *
FROM `ga4-analytics-352613.flattened_dataset.flattened_daily_ga_data_*`
WHERE _TABLE_SUFFIX BETWEEN @DS_START_DATE AND @DS_END_DATE
Best practice
Where possible, use the partitioned flattened data source. The partitioned flattened tables are more efficient to query than the sharded tables, and because the fields have been flattened, they are much easier to use than the nested raw data.
If you are connecting to BigQuery data in Looker Studio, make sure to define your date range.
If you would like to use a dynamic date range with BigQuery data and are using a custom query, please enable the date range parameters in the connection so that Looker Studio only queries the dates required.
To do this you will also need to include the date range parameters in your querry - for example, with a our partitioned flattened GA4 tables (which have been partitioned on the event_date field), use WHERE event_date >= PARSE_DATE('%Y%m%d', @DS_START_DATE) and event_date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
. If you are using a data source comprised of sharded tables, the syntax is slightly different, and you would qualify your query with something like: WHERE _TABLE_SUFFIX BETWEEN @DS_START_DATE AND @DS_END_DATE
.
Use the ‘pause updates’ feature whilst creating new charts or carrying out a lot of edits on a report to minimise the number of queries run.
Avoid creating new connections to BigQuery datasets in Looker Studio as Looker Studio can query the entire dataset whilst connecting it to the Looker Studio dashboard, which can be very expensive. Use shared connections where possible.