Skip to main content

Content Data App backfilling

This page contains information on checking and backfilling the Content Data app table in BigQuery.

Data checking

The following query may help you if you need to check the volume of data in the Content Data table by date.

sql SELECT the_date, sum(unique_page_views) FROM `govuk-content-data.ga4.GA4 dataform` GROUP BY 1 ORDER BY 1 DESC

If any dates are missing, there is no data for that date and it will need backfilling.

Backfilling

The below query will enable you to backfill the Content Data table (‘GA4 dataform’). You will need to change the ‘YYYYMMDD’ in this query to the date you need to query and append the data for. You can also change this to a BETWEEN or IN statement.

The data should be appended to the govuk-content-data.ga4.GA4 dataform table.

Once the data has been successfully appended (it is best to check this has worked using the checking query above), you will have to notify #govuk-publishing-content-modelling-dev that the data is ready to reimport.

sql WITH CTE1 AS ( SELECT user_pseudo_id, ( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'search_term') AS search_term, SPLIT(SPLIT(REGEXP_REPLACE(( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'page_location'), 'https://www.gov.uk', ''),'?')[SAFE_OFFSET(0)],'#')[SAFE_OFFSET(0)] AS cleaned_page_location, ( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'ui_text') AS ui_text, CONCAT(user_pseudo_id, ( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id')) AS unique_session_id, ( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'entrances') AS entrances, event_name, event_timestamp, FORMAT_DATE('%Y-%m-%d', PARSE_DATE('%Y%m%d', event_date)) AS the_date, IFNULL((CAST(( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = "session_engaged") AS STRING)),( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = "session_engaged")) AS session_engaged, IF (LEAD(event_name) OVER (PARTITION BY CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id')) ORDER BY event_timestamp) IS NULL, 1, NULL) AS isExit, FROM `ga4-analytics-352613.analytics_330577055.events_*` WHERE _table_suffix = YYYYMMDD ), CTE2 AS ( SELECT the_date, cleaned_page_location, COUNT(DISTINCT CASE WHEN event_name = 'page_view' THEN unique_session_ID END ) AS unique_page_views, COUNT(CASE WHEN event_name = 'page_view' THEN unique_session_ID END ) AS total_page_views, COUNT(DISTINCT CASE WHEN isexit = 1 THEN unique_session_ID END ) AS exits, COUNT(DISTINCT CASE WHEN entrances = 1 THEN unique_session_ID END ) AS entrances, COUNT(CASE WHEN event_name = 'search' THEN search_term END ) AS total_searches, COUNT(DISTINCT CASE WHEN event_name = 'search' THEN unique_session_id END ) AS unique_search_sessions, COUNT(DISTINCT CASE WHEN event_name = 'search' THEN search_term END ) AS unique_searchterms, COUNT(CASE WHEN event_name = 'form_submit' AND ui_text = 'Yes' THEN unique_session_id END ) AS useful_yes, COUNT(CASE WHEN event_name = 'form_submit' AND ui_text = 'No' THEN unique_session_id END ) AS useful_no, COUNT(DISTINCT CASE WHEN event_name = 'session_start' THEN unique_session_iD END ) AS session_starts, COUNT(DISTINCT CASE WHEN session_engaged = '1' THEN unique_session_id END ) AS session_engaged FROM CTE1 GROUP BY cleaned_page_location, the_date ) SELECT * FROM CTE2

This page was last reviewed on 22 January 2026. It needs to be reviewed again on 22 July 2026 .