Content Data app
The Content Data app summarises and displays key statistics on the performance of pages on www.gov.uk.
The app mostly relies on GA4 data and is populated via a scheduled query and table in BigQuery, detailed below.
Access
The Content Data app itself requires a Signon account to access.
The data which populates the Content data app is stored in BigQuery in the govuk-content-data project. Access to this project and dataset is currently limited to individuals that are working on the project.
Location
The table directly connected to the tool can be found in BigQuery under the data set govuk-content-data.ga4.GA4 dataform
.
This table can be found in BigQuery under the data set govuk-content-data.ga4.ga4_two_years
.
For more information on our Google Cloud Platform projects, see our GCP project documentation.
Schema
field name | type | mode |
---|---|---|
the_date | STRING | NULLABLE |
cleaned_page_location | STRING | NULLABLE |
unique_page_views | INTEGER | NULLABLE |
total_page_views | INTEGER | NULLABLE |
exits | INTEGER | NULLABLE |
entrances | INTEGER | NULLABLE |
total_searches | INTEGER | NULLABLE |
unique_search_sessions | INTEGER | NULLABLE |
unique_searchterms | INTEGER | NULLABLE |
useful_yes | INTEGER | NULLABLE |
useful_no | INTEGER | NULLABLE |
session_starts | INTEGER | NULLABLE |
session_engaged | INTEGER | NULLABLE |
Code for pulling in GA4 data
Presently, the table which populates the Content Data app is generated via a scheduled query. The code for this query can be found below.
WITH
CTE1 AS (
SELECT
user_pseudo_id,
(
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = 'search_term') AS search_term,
SPLIT(SPLIT(REGEXP_REPLACE((
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = 'page_location'), 'https://www.gov.uk', ''),'?')[SAFE_OFFSET(0)],'#')[SAFE_OFFSET(0)] AS cleaned_page_location,
(
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = 'ui_text') AS ui_text,
CONCAT(user_pseudo_id, (
SELECT
value.int_value
FROM
UNNEST(event_params)
WHERE
KEY = 'ga_session_id')) AS unique_session_id,
(
SELECT
value.int_value
FROM
UNNEST(event_params)
WHERE
KEY = 'entrances') AS entrances,
event_name,
event_timestamp,
FORMAT_DATE('%Y-%m-%d', PARSE_DATE('%Y%m%d', event_date)) AS the_date,
IFNULL((CAST((
SELECT
value.int_value
FROM
UNNEST(event_params)
WHERE
KEY = "session_engaged") AS STRING)),(
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = "session_engaged")) AS session_engaged,
IF
(LEAD(event_name) OVER (PARTITION BY CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params)
WHERE
KEY = 'ga_session_id'))
ORDER BY
event_timestamp) IS NULL, 1, NULL) AS isExit,
FROM
`ga4-analytics-352613.analytics_330577055.events_*`
WHERE
_table_suffix = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) ),
CTE2 AS (
SELECT
the_date,
cleaned_page_location,
COUNT(DISTINCT
CASE
WHEN event_name = 'page_view' THEN unique_session_ID
END
) AS unique_page_views,
COUNT(CASE
WHEN event_name = 'page_view' THEN unique_session_ID
END
) AS total_page_views,
COUNT(DISTINCT
CASE
WHEN isexit = 1 THEN unique_session_ID
END
) AS exits,
COUNT(DISTINCT
CASE
WHEN entrances = 1 THEN unique_session_ID
END
) AS entrances,
COUNT(CASE
WHEN event_name = 'search' THEN search_term
END
) AS total_searches,
COUNT(DISTINCT
CASE
WHEN event_name = 'search' THEN unique_session_id
END
) AS unique_search_sessions,
COUNT(DISTINCT
CASE
WHEN event_name = 'search' THEN search_term
END
) AS unique_searchterms,
COUNT(CASE
WHEN event_name = 'form_submit' AND ui_text = 'Yes' THEN unique_session_id
END
) AS useful_yes,
COUNT(CASE
WHEN event_name = 'form_submit' AND ui_text = 'No' THEN unique_session_id
END
) AS useful_no,
COUNT(DISTINCT
CASE
WHEN event_name = 'session_start' THEN unique_session_iD
END
) AS session_starts,
COUNT(DISTINCT
CASE
WHEN session_engaged = '1' THEN unique_session_id
END
) AS session_engaged
FROM
CTE1
GROUP BY
cleaned_page_location,
the_date )
SELECT
*
FROM
CTE2