Skip to main content

Content Data App backfilling

This page contains information on checking and backfilling the Content Data app table in BigQuery.

Data checking

The following query may help you if you need to check the volume of data in the Content Data table by date.

SELECT
  the_date,
  sum(unique_page_views)
FROM
  `govuk-content-data.ga4.GA4 dataform`
GROUP BY 1
ORDER BY
1 DESC

If any dates are missing, there is no data for that date and it will need backfilling.

Backfilling

The below query will enable you to backfill the Content Data table (‘GA4 dataform’). You will need to change the ‘YYYYMMDD’ in this query to the date you need to query and append the data for. You can also change this to a BETWEEN or IN statement.

The data should be appended to the govuk-content-data.ga4.GA4 dataform table.

Once the data has been successfully appended (it is best to check this has worked using the checking query above), you will have to notify #govuk-platform-support that the data is ready to reimport.

WITH
  CTE1 AS (
  SELECT
    user_pseudo_id,
    (
    SELECT
      value.string_value
    FROM
      UNNEST(event_params)
    WHERE
      KEY = 'search_term') AS search_term,
    SPLIT(SPLIT(REGEXP_REPLACE((
          SELECT
            value.string_value
          FROM
            UNNEST(event_params)
          WHERE
            KEY = 'page_location'), 'https://www.gov.uk', ''),'?')[SAFE_OFFSET(0)],'#')[SAFE_OFFSET(0)] AS cleaned_page_location,
    (
    SELECT
      value.string_value
    FROM
      UNNEST(event_params)
    WHERE
      KEY = 'ui_text') AS ui_text,
    CONCAT(user_pseudo_id, (
      SELECT
        value.int_value
      FROM
        UNNEST(event_params)
      WHERE
        KEY = 'ga_session_id')) AS unique_session_id,
    (
    SELECT
      value.int_value
    FROM
      UNNEST(event_params)
    WHERE
      KEY = 'entrances') AS entrances,
    event_name,
    event_timestamp,
    FORMAT_DATE('%Y-%m-%d', PARSE_DATE('%Y%m%d', event_date)) AS the_date,
    IFNULL((CAST((
          SELECT
            value.int_value
          FROM
            UNNEST(event_params)
          WHERE
            KEY = "session_engaged") AS STRING)),(
      SELECT
        value.string_value
      FROM
        UNNEST(event_params)
      WHERE
        KEY = "session_engaged")) AS session_engaged,
  IF
    (LEAD(event_name) OVER (PARTITION BY CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params)
          WHERE
            KEY = 'ga_session_id'))
      ORDER BY
        event_timestamp) IS NULL, 1, NULL) AS isExit,
  FROM
    `ga4-analytics-352613.analytics_330577055.events_*`
  WHERE
    _table_suffix = YYYYMMDD ),
  CTE2 AS (
  SELECT
    the_date,
    cleaned_page_location,
    COUNT(DISTINCT
      CASE
        WHEN event_name = 'page_view' THEN unique_session_ID
    END
      ) AS unique_page_views,
    COUNT(CASE
        WHEN event_name = 'page_view' THEN unique_session_ID
    END
      ) AS total_page_views,
    COUNT(DISTINCT
      CASE
        WHEN isexit = 1 THEN unique_session_ID
    END
      ) AS exits,
    COUNT(DISTINCT
      CASE
        WHEN entrances = 1 THEN unique_session_ID
    END
      ) AS entrances,
    COUNT(CASE
        WHEN event_name = 'search' THEN search_term
    END
      ) AS total_searches,
    COUNT(DISTINCT
      CASE
        WHEN event_name = 'search' THEN unique_session_id
    END
      ) AS unique_search_sessions,
    COUNT(DISTINCT
      CASE
        WHEN event_name = 'search' THEN search_term
    END
      ) AS unique_searchterms,
    COUNT(CASE
        WHEN event_name = 'form_submit' AND ui_text = 'Yes' THEN unique_session_id
    END
      ) AS useful_yes,
    COUNT(CASE
        WHEN event_name = 'form_submit' AND ui_text = 'No' THEN unique_session_id
    END
      ) AS useful_no,
    COUNT(DISTINCT
      CASE
        WHEN event_name = 'session_start' THEN unique_session_iD
    END
      ) AS session_starts,
    COUNT(DISTINCT
      CASE
        WHEN session_engaged = '1' THEN unique_session_id
    END
      ) AS session_engaged
  FROM
    CTE1
  GROUP BY
    cleaned_page_location,
    the_date )
SELECT
  *
FROM
  CTE2
This page was last reviewed on 28 June 2024. It needs to be reviewed again on 28 December 2024 .
This page was set to be reviewed before 28 December 2024. This might mean the content is out of date.