Skip to main content

Content Data app

The Content Data app helps users find key metrics about the performance of pages on www.gov.uk, including:

  • page views of pages on GOV.UK
  • searches from pages on GOV.UK
  • feedback on pages on GOV.UK

The app mostly relies on GA4 data and is populated via a scheduled query and table in BigQuery, detailed below. Some additional data comes from Feedback/the support API.

Using the Content Data app

Get access to the Content Data app

The Content Data app itself requires a Signon account to access. To get access, you should contact your Content SPOC, who can request your access to Signon.

Most of the data which populates the Content data app is stored in BigQuery in the govuk-content-data project. Access to this project and dataset is currently limited to individuals that are working on the project.

Content Data data definitions

One of the key data sources supplying the Content Data app changed in late 2023, when the Content Data app was migrated from Universal Analytics (UA) to use Google Analytics 4 (GA4) data. The app previously extracted Universal Analytics data from the Google Analytics Reporting API v4, but as of December 2023 it now draws on a table of metrics stored in BigQuery, based on the raw GA4 data. This does mean the exact definition of some metrics has changed. The table below reveals potential differences when looking at Content Data over this period.

Field name Definition pre-December 2023 (UA) Definition since December 2023 (GA4) Notes
Unique page views The uniquePageviews metric returned from the API A count of distinct session IDs attached to page_view events on the page in question The figures returned will differ between UA and GA4 due to the change in the definition of a session
Page views The total number of page views the selected page received in the given period - the pageviews metric returned from the API The total number of page views the selected page received in the given period - a count of distinct session IDs attached to page_view events on the page in question Should be very similar across UA and GA4
Page views per visit The number of total page views divided by the number of unique page views, to show on average how many times a page was viewed during a user’s session The figures returned will differ between UA and GA4 due to the change in the definition of a session
Searches from page The searchUniques metric returned from the API where the page in question was the searchStartPage The number of search events sent on the page These are potentially quite different metrics
Number of feedback comments The number of Feedex comments returned from the support API No change Data from Feedex - not impacted by the migration to GA4
Users who found page useful The events with Event action ‘ffYesClick’ divided by the events with the Event action ‘ffNoClick’ and ‘ffYesClick’ returned from the API The form_submit events with ui_text ‘Yes’ divided by all the ‘Yes’ and ‘No’ form_submit events

 CSV export

You can also export a CSV of data from the app’s search results page. This contains all the metrics stored (GA4, feedback and word count/reading time) on the pages returned by your search as well as each page’s title, organisation and document type.

How the Content Data app works

The Content Data app is a Ruby on Rails app which imports page performance metrics into a data warehouse. These metrics are then exposed via an API and displayed in scorecards and charts on a website for easy use by civil servants across govermnment. Further technical documentation can be found on the Developer docs site.

Data sources

The table supplying the Content Data app with GA4 data can be found in BigQuery under the data set govuk-content-data.ga4.GA4 dataform. This is updated every day with the previous day’s data via a scheduled query (‘Daily content data’) in the same project (see the code below). This query is currently scheduled to run at 9:30am.

A script then imports yesterday’s data from the GA4 dataform table into Content Data every day at 9:40am UTC.

If the import fails for any reason, it needs to be manually re-run. The process and SQL code to used can be found on the Content Data backfilling page. Once the GA4 dataform table in BigQuery has been updated, the GOV.UK Platform Engineering team need to be alerted to re-run the import.

The scheduled query is as followed:

WITH
  CTE1 AS (
  SELECT
    user_pseudo_id,
    (
    SELECT
      value.string_value
    FROM
      UNNEST(event_params)
    WHERE
      KEY = 'search_term') AS search_term,
    SPLIT(SPLIT(REGEXP_REPLACE((
          SELECT
            value.string_value
          FROM
            UNNEST(event_params)
          WHERE
            KEY = 'page_location'), 'https://www.gov.uk', ''),'?')[SAFE_OFFSET(0)],'#')[SAFE_OFFSET(0)] AS cleaned_page_location,
    (
    SELECT
      value.string_value
    FROM
      UNNEST(event_params)
    WHERE
      KEY = 'ui_text') AS ui_text,
    CONCAT(user_pseudo_id, (
      SELECT
        value.int_value
      FROM
        UNNEST(event_params)
      WHERE
        KEY = 'ga_session_id')) AS unique_session_id,
    (
    SELECT
      value.int_value
    FROM
      UNNEST(event_params)
    WHERE
      KEY = 'entrances') AS entrances,
    event_name,
    event_timestamp,
    FORMAT_DATE('%Y-%m-%d', PARSE_DATE('%Y%m%d', event_date)) AS the_date,
    IFNULL((CAST((
          SELECT
            value.int_value
          FROM
            UNNEST(event_params)
          WHERE
            KEY = "session_engaged") AS STRING)),(
      SELECT
        value.string_value
      FROM
        UNNEST(event_params)
      WHERE
        KEY = "session_engaged")) AS session_engaged,
  IF
    (LEAD(event_name) OVER (PARTITION BY CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params)
          WHERE
            KEY = 'ga_session_id'))
      ORDER BY
        event_timestamp) IS NULL, 1, NULL) AS isExit,
  FROM
    `ga4-analytics-352613.analytics_330577055.events_*`
  WHERE
    _table_suffix = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) ),
  CTE2 AS (
  SELECT
    the_date,
    cleaned_page_location,
    COUNT(DISTINCT
      CASE
        WHEN event_name = 'page_view' THEN unique_session_ID
    END
      ) AS unique_page_views,
    COUNT(CASE
        WHEN event_name = 'page_view' THEN unique_session_ID
    END
      ) AS total_page_views,
    COUNT(DISTINCT
      CASE
        WHEN isexit = 1 THEN unique_session_ID
    END
      ) AS exits,
    COUNT(DISTINCT
      CASE
        WHEN entrances = 1 THEN unique_session_ID
    END
      ) AS entrances,
    COUNT(CASE
        WHEN event_name = 'search' THEN search_term
    END
      ) AS total_searches,
    COUNT(DISTINCT
      CASE
        WHEN event_name = 'search' THEN unique_session_id
    END
      ) AS unique_search_sessions,
    COUNT(DISTINCT
      CASE
        WHEN event_name = 'search' THEN search_term
    END
      ) AS unique_searchterms,
    COUNT(CASE
        WHEN event_name = 'form_submit' AND ui_text = 'Yes' THEN unique_session_id
    END
      ) AS useful_yes,
    COUNT(CASE
        WHEN event_name = 'form_submit' AND ui_text = 'No' THEN unique_session_id
    END
      ) AS useful_no,
    COUNT(DISTINCT
      CASE
        WHEN event_name = 'session_start' THEN unique_session_iD
    END
      ) AS session_starts,
    COUNT(DISTINCT
      CASE
        WHEN session_engaged = '1' THEN unique_session_id
    END
      ) AS session_engaged
  FROM
    CTE1
  GROUP BY
    cleaned_page_location,
    the_date )
SELECT
  *
FROM
  CTE2

GA4 dataform table schema

field name type mode
the_date STRING NULLABLE
cleaned_page_location STRING NULLABLE
unique_page_views INTEGER NULLABLE
total_page_views INTEGER NULLABLE
exits INTEGER NULLABLE
entrances INTEGER NULLABLE
total_searches INTEGER NULLABLE
unique_search_sessions INTEGER NULLABLE
unique_searchterms INTEGER NULLABLE
useful_yes INTEGER NULLABLE
useful_no INTEGER NULLABLE
session_starts INTEGER NULLABLE
session_engaged INTEGER NULLABLE
This page was last reviewed on 21 November 2024. It needs to be reviewed again on 21 May 2025 .
This page was set to be reviewed before 21 May 2025. This might mean the content is out of date.