Content Data App backfilling
This page contains information on checking and backfilling the Content Data app table in BigQuery.
Data checking
The following query may help you if you need to check the volume of data in the Content Data table by date.
SELECT
the_date,
sum(unique_page_views)
FROM
`govuk-content-data.ga4.GA4 dataform`
GROUP BY 1
ORDER BY
1 DESC
If any dates are missing, there is no data for that date and it will need backfilling.
Backfilling
The below query will enable you to backfill the Content Data table (‘GA4 dataform’).
You will need to change the ‘YYYYMMDD’ in this query to the date you need to query and append the data for.
You can also change this to a BETWEEN
or IN
statement.
The data should be appended to the govuk-content-data.ga4.GA4 dataform
table.
Once the data has been successfully appended (it is best to check this has worked using the checking query above), you will have to notify #govuk-platform-support that the data is ready to reimport.
WITH
CTE1 AS (
SELECT
user_pseudo_id,
(
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = 'search_term') AS search_term,
SPLIT(SPLIT(REGEXP_REPLACE((
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = 'page_location'), 'https://www.gov.uk', ''),'?')[SAFE_OFFSET(0)],'#')[SAFE_OFFSET(0)] AS cleaned_page_location,
(
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = 'ui_text') AS ui_text,
CONCAT(user_pseudo_id, (
SELECT
value.int_value
FROM
UNNEST(event_params)
WHERE
KEY = 'ga_session_id')) AS unique_session_id,
(
SELECT
value.int_value
FROM
UNNEST(event_params)
WHERE
KEY = 'entrances') AS entrances,
event_name,
event_timestamp,
FORMAT_DATE('%Y-%m-%d', PARSE_DATE('%Y%m%d', event_date)) AS the_date,
IFNULL((CAST((
SELECT
value.int_value
FROM
UNNEST(event_params)
WHERE
KEY = "session_engaged") AS STRING)),(
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = "session_engaged")) AS session_engaged,
IF
(LEAD(event_name) OVER (PARTITION BY CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params)
WHERE
KEY = 'ga_session_id'))
ORDER BY
event_timestamp) IS NULL, 1, NULL) AS isExit,
FROM
`ga4-analytics-352613.analytics_330577055.events_*`
WHERE
_table_suffix = YYYYMMDD ),
CTE2 AS (
SELECT
the_date,
cleaned_page_location,
COUNT(DISTINCT
CASE
WHEN event_name = 'page_view' THEN unique_session_ID
END
) AS unique_page_views,
COUNT(CASE
WHEN event_name = 'page_view' THEN unique_session_ID
END
) AS total_page_views,
COUNT(DISTINCT
CASE
WHEN isexit = 1 THEN unique_session_ID
END
) AS exits,
COUNT(DISTINCT
CASE
WHEN entrances = 1 THEN unique_session_ID
END
) AS entrances,
COUNT(CASE
WHEN event_name = 'search' THEN search_term
END
) AS total_searches,
COUNT(DISTINCT
CASE
WHEN event_name = 'search' THEN unique_session_id
END
) AS unique_search_sessions,
COUNT(DISTINCT
CASE
WHEN event_name = 'search' THEN search_term
END
) AS unique_searchterms,
COUNT(CASE
WHEN event_name = 'form_submit' AND ui_text = 'Yes' THEN unique_session_id
END
) AS useful_yes,
COUNT(CASE
WHEN event_name = 'form_submit' AND ui_text = 'No' THEN unique_session_id
END
) AS useful_no,
COUNT(DISTINCT
CASE
WHEN event_name = 'session_start' THEN unique_session_iD
END
) AS session_starts,
COUNT(DISTINCT
CASE
WHEN session_engaged = '1' THEN unique_session_id
END
) AS session_engaged
FROM
CTE1
GROUP BY
cleaned_page_location,
the_date )
SELECT
*
FROM
CTE2