Content Data app
The Content Data app helps users find key metrics about the performance of pages on www.gov.uk, including:
- page views of pages on GOV.UK
- searches from pages on GOV.UK
- feedback on pages on GOV.UK
The app mostly relies on GA4 data and is populated via a scheduled query and table in BigQuery, detailed below. Some additional data comes from Feedback/the support API.
Using the Content Data app
Get access to the Content Data app
The Content Data app itself requires a Signon account to access. To get access, you should contact your Content SPOC, who can request your access to Signon.
Most of the data which populates the Content data app is stored in BigQuery in the govuk-content-data project. Access to this project and dataset is currently limited to individuals that are working on the project.
Content Data data definitions
One of the key data sources supplying the Content Data app changed in late 2023, when the Content Data app was migrated from Universal Analytics (UA) to use Google Analytics 4 (GA4) data. The app previously extracted Universal Analytics data from the Google Analytics Reporting API v4, but as of December 2023 it now draws on a table of metrics stored in BigQuery, based on the raw GA4 data. This does mean the exact definition of some metrics has changed. The table below reveals potential differences when looking at Content Data over this period.
Field name | Definition pre-December 2023 (UA) | Definition since December 2023 (GA4) | Notes |
---|---|---|---|
Unique page views | The uniquePageviews metric returned from the API
|
A count of distinct session IDs attached to page_view events on the page in question | The figures returned will differ between UA and GA4 due to the change in the definition of a session |
Page views | The total number of page views the selected page received in the given period - the pageviews metric returned from the API
|
The total number of page views the selected page received in the given period - a count of distinct session IDs attached to page_view events on the page in question | Should be very similar across UA and GA4 |
Page views per visit | The number of total page views divided by the number of unique page views, to show on average how many times a page was viewed during a user’s session | The figures returned will differ between UA and GA4 due to the change in the definition of a session | |
Searches from page | The searchUniques metric returned from the API where the page in question was the searchStartPage
|
The number of search events sent on the page |
These are potentially quite different metrics |
Number of feedback comments | The number of Feedex comments returned from the support API | No change | Data from Feedex - not impacted by the migration to GA4 |
Users who found page useful | The events with Event action ‘ffYesClick’ divided by the events with the Event action ‘ffNoClick’ and ‘ffYesClick’ returned from the API
|
The form_submit events with ui_text ‘Yes’ divided by all the ‘Yes’ and ‘No’ form_submit events |
CSV export
You can also export a CSV of data from the app’s search results page. This contains all the metrics stored (GA4, feedback and word count/reading time) on the pages returned by your search as well as each page’s title, organisation and document type.
How the Content Data app works
The Content Data app is a Ruby on Rails app which imports page performance metrics into a data warehouse. These metrics are then exposed via an API and displayed in scorecards and charts on a website for easy use by civil servants across govermnment. Further technical documentation can be found on the Developer docs site.
Data sources
The table supplying the Content Data app with GA4 data can be found in BigQuery under the data set govuk-content-data.ga4.GA4 dataform
.
This is updated every day with the previous day’s data via a scheduled query (‘Daily content data’) in the same project (see the code below).
This query is currently scheduled to run at 9:30am.
A script then imports yesterday’s data from the GA4 dataform
table into Content Data every day at 9:40am UTC.
If the import fails for any reason, it needs to be manually re-run. The process and SQL code to used can be found on the Content Data backfilling page.
Once the GA4 dataform
table in BigQuery has been updated, the GOV.UK Platform Engineering team need to be alerted to re-run the import.
The scheduled query is as followed:
WITH
CTE1 AS (
SELECT
user_pseudo_id,
(
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = 'search_term') AS search_term,
SPLIT(SPLIT(REGEXP_REPLACE((
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = 'page_location'), 'https://www.gov.uk', ''),'?')[SAFE_OFFSET(0)],'#')[SAFE_OFFSET(0)] AS cleaned_page_location,
(
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = 'ui_text') AS ui_text,
CONCAT(user_pseudo_id, (
SELECT
value.int_value
FROM
UNNEST(event_params)
WHERE
KEY = 'ga_session_id')) AS unique_session_id,
(
SELECT
value.int_value
FROM
UNNEST(event_params)
WHERE
KEY = 'entrances') AS entrances,
event_name,
event_timestamp,
FORMAT_DATE('%Y-%m-%d', PARSE_DATE('%Y%m%d', event_date)) AS the_date,
IFNULL((CAST((
SELECT
value.int_value
FROM
UNNEST(event_params)
WHERE
KEY = "session_engaged") AS STRING)),(
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = "session_engaged")) AS session_engaged,
IF
(LEAD(event_name) OVER (PARTITION BY CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params)
WHERE
KEY = 'ga_session_id'))
ORDER BY
event_timestamp) IS NULL, 1, NULL) AS isExit,
FROM
`ga4-analytics-352613.analytics_330577055.events_*`
WHERE
_table_suffix = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) ),
CTE2 AS (
SELECT
the_date,
cleaned_page_location,
COUNT(DISTINCT
CASE
WHEN event_name = 'page_view' THEN unique_session_ID
END
) AS unique_page_views,
COUNT(CASE
WHEN event_name = 'page_view' THEN unique_session_ID
END
) AS total_page_views,
COUNT(DISTINCT
CASE
WHEN isexit = 1 THEN unique_session_ID
END
) AS exits,
COUNT(DISTINCT
CASE
WHEN entrances = 1 THEN unique_session_ID
END
) AS entrances,
COUNT(CASE
WHEN event_name = 'search' THEN search_term
END
) AS total_searches,
COUNT(DISTINCT
CASE
WHEN event_name = 'search' THEN unique_session_id
END
) AS unique_search_sessions,
COUNT(DISTINCT
CASE
WHEN event_name = 'search' THEN search_term
END
) AS unique_searchterms,
COUNT(CASE
WHEN event_name = 'form_submit' AND ui_text = 'Yes' THEN unique_session_id
END
) AS useful_yes,
COUNT(CASE
WHEN event_name = 'form_submit' AND ui_text = 'No' THEN unique_session_id
END
) AS useful_no,
COUNT(DISTINCT
CASE
WHEN event_name = 'session_start' THEN unique_session_iD
END
) AS session_starts,
COUNT(DISTINCT
CASE
WHEN session_engaged = '1' THEN unique_session_id
END
) AS session_engaged
FROM
CTE1
GROUP BY
cleaned_page_location,
the_date )
SELECT
*
FROM
CTE2
GA4 dataform table schema
field name | type | mode |
---|---|---|
the_date | STRING | NULLABLE |
cleaned_page_location | STRING | NULLABLE |
unique_page_views | INTEGER | NULLABLE |
total_page_views | INTEGER | NULLABLE |
exits | INTEGER | NULLABLE |
entrances | INTEGER | NULLABLE |
total_searches | INTEGER | NULLABLE |
unique_search_sessions | INTEGER | NULLABLE |
unique_searchterms | INTEGER | NULLABLE |
useful_yes | INTEGER | NULLABLE |
useful_no | INTEGER | NULLABLE |
session_starts | INTEGER | NULLABLE |
session_engaged | INTEGER | NULLABLE |