Skip to main content

Content Data app

The Content Data app helps users find key metrics about the performance of pages on www.gov.uk, including:

  • page views of pages on GOV.UK
  • searches from pages on GOV.UK
  • feedback on pages on GOV.UK

The app mostly relies on GA4 data and is populated via a scheduled query and table in BigQuery, detailed below. Some additional data comes from Feedback/the support API.

Using the Content Data app

### Get access to the Content Data app

The Content Data app itself requires a Signon account to access. To get access, you should contact your Content SPOC, who can request your access to Signon.

Most of the data which populates the Content data app is stored in BigQuery in the govuk-content-data project. Access to this project and dataset is currently limited to individuals that are working on the project.

Content Data data definitions

One of the key data sources supplying the Content Data app changed in late 2023, when the Content Data app was migrated from Universal Analytics (UA) to use Google Analytics 4 (GA4) data. The app previously extracted Universal Analytics data from the Google Analytics Reporting API v4, but as of December 2023 it now draws on a table of metrics stored in BigQuery, based on the raw GA4 data. This does mean the exact definition of some metrics has changed. The table below reveals potential differences when looking at Content Data over this period.

Field name Definition pre-December 2023 (UA) Definition since December 2023 (GA4) Notes
Unique page views The uniquePageviews metric returned from the API A count of distinct session IDs attached to page_view events on the page in question The figures returned will differ between UA and GA4 due to the change in the definition of a session
Page views The total number of page views the selected page received in the given period - the pageviews metric returned from the API The total number of page views the selected page received in the given period - a count of session IDs attached to page_view events on the page in question Should be very similar across UA and GA4
Page views per visit   The number of total page views divided by the number of unique page views, to show on average how many times a page was viewed during a user’s session The figures returned will differ between UA and GA4 due to the change in the definition of a session
Searches from page The searchUniques metric returned from the API where the page in question was the searchStartPage The number of search events sent on the page These are potentially quite different metrics
Number of feedback comments The number of Feedex comments returned from the support API No change Data from Feedex - not impacted by the migration to GA4
Users who found page useful The events with Event action ‘ffYesClick’ divided by the events with the Event action ‘ffNoClick’ and ‘ffYesClick’ returned from the API The form_submit events with ui_text ‘Yes’ divided by all the ‘Yes’ and ‘No’ form_submit events  

 CSV export

You can also export a CSV of data from the app’s search results page. This contains all the metrics stored (GA4, feedback and word count/reading time) on the pages returned by your search as well as each page’s title, organisation and document type.

How the Content Data app works

The Content Data app is a Ruby on Rails app which imports page performance metrics into a data warehouse. These metrics are then exposed via an API and displayed in scorecards and charts on a website for easy use by civil servants across govermnment. Further technical documentation can be found on the Developer docs site.

Data sources

The table supplying the Content Data app with GA4 data can be found in BigQuery under the data set govuk-content-data.ga4.GA4 dataform. This is updated every day with the previous day’s data via a scheduled query (‘Daily content data’) in the same project (see the code below). This query is currently scheduled to run at 9:30am.

A script then imports yesterday’s data from the GA4 dataform table into Content Data every day at 9:40am UTC.

If the import fails for any reason, it needs to be manually re-run. The process and SQL code to used can be found on the Content Data backfilling page. Once the GA4 dataform table in BigQuery has been updated, the GOV.UK Platform Engineering team need to be alerted to re-run the import.

The scheduled query is as followed:

SQL WITH CTE1 AS ( SELECT user_pseudo_id, ( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'search_term') AS search_term, SPLIT(SPLIT(REGEXP_REPLACE(( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'page_location'), 'https://www.gov.uk', ''),'?')[SAFE_OFFSET(0)],'#')[SAFE_OFFSET(0)] AS cleaned_page_location, ( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'ui_text') AS ui_text, CONCAT(user_pseudo_id, ( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id')) AS unique_session_id, ( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'entrances') AS entrances, event_name, event_timestamp, FORMAT_DATE('%Y-%m-%d', PARSE_DATE('%Y%m%d', event_date)) AS the_date, IFNULL((CAST(( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = "session_engaged") AS STRING)),( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = "session_engaged")) AS session_engaged, IF (LEAD(event_name) OVER (PARTITION BY CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id')) ORDER BY event_timestamp) IS NULL, 1, NULL) AS isExit, FROM `ga4-analytics-352613.analytics_330577055.events_*` WHERE _table_suffix = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) ), CTE2 AS ( SELECT the_date, cleaned_page_location, COUNT(DISTINCT CASE WHEN event_name = 'page_view' THEN unique_session_ID END ) AS unique_page_views, COUNT(CASE WHEN event_name = 'page_view' THEN unique_session_ID END ) AS total_page_views, COUNT(DISTINCT CASE WHEN isexit = 1 THEN unique_session_ID END ) AS exits, COUNT(DISTINCT CASE WHEN entrances = 1 THEN unique_session_ID END ) AS entrances, COUNT(CASE WHEN event_name = 'search' THEN search_term END ) AS total_searches, COUNT(DISTINCT CASE WHEN event_name = 'search' THEN unique_session_id END ) AS unique_search_sessions, COUNT(DISTINCT CASE WHEN event_name = 'search' THEN search_term END ) AS unique_searchterms, COUNT(CASE WHEN event_name = 'form_submit' AND ui_text = 'Yes' THEN unique_session_id END ) AS useful_yes, COUNT(CASE WHEN event_name = 'form_submit' AND ui_text = 'No' THEN unique_session_id END ) AS useful_no, COUNT(DISTINCT CASE WHEN event_name = 'session_start' THEN unique_session_iD END ) AS session_starts, COUNT(DISTINCT CASE WHEN session_engaged = '1' THEN unique_session_id END ) AS session_engaged FROM CTE1 GROUP BY cleaned_page_location, the_date ) SELECT * FROM CTE2

GA4 dataform table schema

| field name | type | mode | | — | — | — | | the_date | STRING | NULLABLE | | cleaned_page_location | STRING | NULLABLE | | unique_page_views | INTEGER | NULLABLE | | total_page_views | INTEGER | NULLABLE | | exits | INTEGER | NULLABLE | | entrances | INTEGER | NULLABLE | | total_searches | INTEGER | NULLABLE | | unique_search_sessions | INTEGER | NULLABLE | | unique_searchterms | INTEGER | NULLABLE | | useful_yes | INTEGER | NULLABLE | | useful_no | INTEGER | NULLABLE | | session_starts | INTEGER | NULLABLE | | session_engaged | INTEGER | NULLABLE |

This page was last reviewed on 25 February 2025. It needs to be reviewed again on 25 August 2025 .