Skip to main content

GOV.UK GA4 data alert processes

There are a number of ways we monitor the quality and integrity of the GOV.UK GA4 data. These include basic alerts which tell us whether or not yesterday’s data has arrived and has been processed in BigQuery, as well as a number of data quality/annotation dashboards.

This page focuses on a separate, more detailed data alerts process which notifies us of more specific changes to the GOV.UK GA4, and whether or not these changes fall outside of expected bounds.

The check runs each day, examining the previous day’s data and sends the results by email to the GOVUK GA4 Alerts Google group.

How has the alerts process been set up?

The data alert check is a multi step process, starting with running queries against the data in BigQuery. The results of these queries are then exported into Google sheets, and from here an Apps Script runs which sends alerts based on particular thresholds/parameters.

Here is the process is more detail:

Step 1: BigQuery

There are six queries which run each day to check yesterday’s GOV.UK GA4 data. They query either the flattened partitioned table, or the partitioned nested tables (depending on the query).

They run as scheduled queries in the gds-bq-reporting project, and each output results to tables which are overwritten each day.

The queries are as following

Query 1: Basic checks

This query simply outputs yesterday’s event date, and whether or not it exists in the partitioned flattened table. It also calculates the previous day’s event count, and another column with the average event count over the last two weeks for either a weekend day, a friday, or any other week day for comparison. The query is:

-- yesterday exists checker 
with yesterday_check as (
SELECT
DISTINCT
event_date,
  CASE
    WHEN event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
    THEN 'Yes' ELSE 'No'
    END AS date_exists
FROM
  `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE
  event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
),

  - yesterday total event count

  total_event_count AS (
  SELECT
    count (*) AS date_event_count
  FROM
    `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
  WHERE
    event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)),

  -- average event count last two weeks for week days
  week_day_average_event_count AS (
  SELECT
    ROUND(AVG(event_count),0) AS average_event_count_weekday
  FROM (
    SELECT
      count (*) AS event_count
    FROM
      `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
    WHERE
      event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 14 DAY) AND CURRENT_DATE()
      AND EXTRACT(DAYOFWEEK FROM event_date) BETWEEN 2 AND 5
    GROUP BY
      event_date ) ),

  -- average event count last two weeks for weekends
  weekend_average_event_count as (
SELECT
  ROUND(AVG(event_count),0) AS average_event_count_weekend
FROM (
  SELECT
    count (*) AS event_count
  FROM
    `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
 WHERE 
    event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 14 DAY) AND CURRENT_DATE()  
    AND (EXTRACT(DAYOFWEEK FROM event_date) = 1 OR EXTRACT(DAYOFWEEK FROM event_date) = 7)
  GROUP BY
    event_date)),

    friday_average_event_count as (
SELECT
  ROUND(AVG(event_count),0) AS average_event_count_friday
FROM (
  SELECT
    count (*) AS event_count
  FROM
    `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
 WHERE 
    event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 14 DAY) AND CURRENT_DATE()  
    AND (EXTRACT(DAYOFWEEK FROM event_date) = 6)
  GROUP BY
    event_date))

  -- selects the event date, whether or not it exists, the count of events, and the average count of events over the last two weeks  

  SELECT
  yesterday_check.event_date,
  yesterday_check.date_exists,
  total_event_count.date_event_count,
  CASE 
    WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) = 6
      THEN friday_average_event_count.average_event_count_friday
    WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) BETWEEN 2 AND 5
      THEN week_day_average_event_count.average_event_count_weekday
    ELSE weekend_average_event_count.average_event_count_weekend
  END AS average_event_count_last_two_weeks,
FROM
  total_event_count,
  week_day_average_event_count,
  weekend_average_event_count,
  friday_average_event_count,
  yesterday_check

Query 2: Event data checks

This query looks at the distinct event names which appeared in the data for the previous day, and compares those to the events which appeared in the GA4 data two days earlier. This query helps us to identify whether events have appeared or disappeared unexpectedly between any two days. The query is:


WITH event_counts_two_days_ago AS (
  SELECT
    event_name AS distinct_event_name,
    COUNT(*) AS event_count_two_days_ago
  FROM
     `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
  WHERE
    --event_date = '2024-04-07'
    event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
  GROUP BY
    event_name
),
event_counts_yesterday AS (
  SELECT
    event_name AS distinct_event_name,
    COUNT(*) AS event_count_yesterday
  FROM
      `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
  WHERE
    --event_date = '2024-04-08'
     event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
  GROUP BY
    event_name
)

SELECT
  COALESCE(yesterday.distinct_event_name, two_days_ago.distinct_event_name) AS event_name,
  COALESCE(event_count_yesterday, 0) AS event_count_yesterday,
  COALESCE(event_count_two_days_ago, 0) AS event_count_two_days_ago,
  CASE
    WHEN event_count_yesterday IS NULL THEN 'Disappeared'
    WHEN event_count_two_days_ago IS NULL THEN 'New'
    ELSE 'Exists'
  END AS event_check
FROM
  event_counts_yesterday AS yesterday
FULL OUTER JOIN
  event_counts_two_days_ago AS two_days_ago
ON
  yesterday.distinct_event_name = two_days_ago.distinct_event_name
ORDER BY
  event_check, event_name;

Query 3: Event count average checks

This query returns the event count for the previous day, and another column comparing it to the average event count over the last two weeks for either a weekend day, a friday, or any other week day for comparison. The query is:

WITH event_counts_week_day_average AS (
  SELECT 
    event_name, 
    ROUND(AVG(daily_count), 0) AS average_event_count_week_day
  FROM (
    SELECT
      event_name,
      COUNT(*) AS daily_count
    FROM
      `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
    WHERE
      event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 16 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
      AND EXTRACT(DAYOFWEEK FROM event_date) BETWEEN 2 AND 5
    GROUP BY
      event_name, event_date
  )
  GROUP BY event_name
),

event_counts_friday_average AS (
  SELECT 
    event_name, 
    ROUND(AVG(daily_count), 0) AS average_event_count_friday
  FROM (
    SELECT
      event_name,
      COUNT(*) AS daily_count
    FROM
      `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
    WHERE
      event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 16 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
      AND EXTRACT(DAYOFWEEK FROM event_date) = 6
    GROUP BY
      event_name, event_date
  )
  GROUP BY event_name
),

event_counts_weekend_average AS (
  SELECT 
    event_name, 
    ROUND(AVG(daily_count), 0) AS average_event_count_weekend
  FROM (
    SELECT
      event_name,
      COUNT(*) AS daily_count
    FROM
      `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
    WHERE
      event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 16 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
      AND (EXTRACT(DAYOFWEEK FROM event_date) = 1 OR EXTRACT(DAYOFWEEK FROM event_date) = 7)
    GROUP BY
      event_name, event_date
  )
  GROUP BY event_name
),

event_counts_yesterday AS (
  SELECT
    event_name,
    COUNT(*) AS event_count_yesterday
  FROM
    `ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
  WHERE
    event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
  GROUP BY
    event_name
)

SELECT
  event_counts_yesterday.event_name,
  event_counts_yesterday.event_count_yesterday,
  CASE 
  WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) = 6
  THEN event_counts_friday_average.average_event_count_friday
WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) BETWEEN 2 AND 5 
    THEN event_counts_week_day_average.average_event_count_week_day
    ELSE event_counts_weekend_average.average_event_count_weekend
  END AS average_event_count
FROM 
  event_counts_yesterday
LEFT JOIN 
  event_counts_week_day_average 
ON 
  event_counts_yesterday.event_name = event_counts_week_day_average.event_name
LEFT JOIN 
  event_counts_weekend_average 
ON 
  event_counts_yesterday.event_name = event_counts_weekend_average.event_name
LEFT JOIN event_counts_friday_average
ON
event_counts_yesterday.event_name = event_counts_friday_average.event_name 
ORDER BY 
  event_counts_yesterday.event_count_yesterday DESC;

Query 4: Event parameter checks

This query returns the count of parameters by event seen in yesterday’s data, and for comparison, the equivalent counts for data from the day before yesterday. This is so we can identify where event parameters appear or disappear day to day where we are not expecting this to happen. The query is:

WITH yesterday_event_params AS (
  SELECT
    event_name,
    param.key AS parameter_name,
    COUNT(*) AS count_params_yesterday
  FROM
    `ga4-analytics-352613.analytics_330577055.partitioned_events`,
    UNNEST(event_params) AS param
  WHERE
    event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
  GROUP BY
    1, 2
),

two_days_ago_event_params AS (
  SELECT
    event_name,
    param.key AS parameter_name,
    COUNT(*) AS count_params_two_days_ago
  FROM
    `ga4-analytics-352613.analytics_330577055.partitioned_events`,
    UNNEST(event_params) AS param
  WHERE
    event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
  GROUP BY
    1, 2
)

SELECT 
  COALESCE(yesterday.event_name, two_days_ago.event_name) AS event_name,
  COALESCE(yesterday.parameter_name, two_days_ago.parameter_name) AS parameter_name,
  COALESCE(count_params_yesterday, 0) AS param_count_yesterday,
  COALESCE(count_params_two_days_ago, 0) AS count_params_two_days_ago,
CASE 
WHEN count_params_yesterday IS NULL THEN 'Disappeared'
WHEN count_params_two_days_ago IS NULL THEN 'New'
ELSE 'Exists'
END AS param_check
FROM
  yesterday_event_params AS yesterday
FULL OUTER JOIN
  two_days_ago_event_params AS two_days_ago
ON
  yesterday.event_name = two_days_ago.event_name
  AND yesterday.parameter_name = two_days_ago.parameter_name

Query 5: Event parameter average count checks

This query again counts parameters by event, but then also returns an average count for the last two weeks in an additional column (depending on if we are looking at a weekend day, a friday or any other week day). This way we can spot any anomalous event parameter counts. The query is:

WITH param_counts_week_day_average AS (
  SELECT 
    event_name, 
    parameter_name,
    ROUND(AVG(daily_count), 0) AS average_param_count_week_day
  FROM (
    SELECT
      event_name,
      param.key as parameter_name,
      COUNT(*) AS daily_count
    FROM
      `ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
    WHERE
      event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 16 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
      AND EXTRACT(DAYOFWEEK FROM event_date) BETWEEN 2 AND 5
    GROUP BY
      event_name, event_date, param.key
  )
  GROUP BY event_name, parameter_name
),

param_counts_friday_average AS (
  SELECT 
    event_name, 
    parameter_name,
    ROUND(AVG(daily_count), 0) AS average_param_count_friday
  FROM (
    SELECT
      event_name,
      param.key as parameter_name,
      COUNT(*) AS daily_count
    FROM
      `ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
    WHERE
      event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 16 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
      AND EXTRACT(DAYOFWEEK FROM event_date) = 6
    GROUP BY
      event_name, event_date, param.key
  )
  GROUP BY event_name, parameter_name
),

param_counts_weekend_average AS (
  SELECT 
    event_name, 
    parameter_name,
    ROUND(AVG(daily_count), 0) AS average_param_count_weekend
  FROM (
    SELECT
      event_name,
      param.key as parameter_name,
      COUNT(*) AS daily_count
    FROM
      `ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
    WHERE
      event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 16 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
      AND (EXTRACT(DAYOFWEEK FROM event_date) = 1 OR EXTRACT(DAYOFWEEK FROM event_date) = 7)
    GROUP BY
      event_name, event_date, param.key
  )
  GROUP BY event_name, parameter_name
),

param_counts_yesterday AS (
  SELECT
    event_name,
    param.key as parameter_name,
    COUNT(*) AS param_count_yesterday
  FROM
    `ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
  WHERE
    event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
  GROUP BY
    event_name, parameter_name
)

SELECT
  param_counts_yesterday.event_name,
  param_counts_yesterday.parameter_name,
  param_counts_yesterday.param_count_yesterday,
  CASE 
    WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) = 6
    THEN param_counts_friday_average.average_param_count_friday
    WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) BETWEEN 2 AND 5 
    THEN param_counts_week_day_average.average_param_count_week_day
    ELSE param_counts_weekend_average.average_param_count_weekend
  END AS average_param_count
FROM 
  param_counts_yesterday
LEFT JOIN 
  param_counts_week_day_average 
ON 
  param_counts_yesterday.event_name = param_counts_week_day_average.event_name AND param_counts_yesterday.parameter_name = param_counts_week_day_average.parameter_name
LEFT JOIN 
  param_counts_weekend_average 
ON 
  param_counts_yesterday.event_name = param_counts_weekend_average.event_name AND param_counts_yesterday.parameter_name = param_counts_weekend_average.parameter_name
LEFT JOIN 
  param_counts_friday_average
ON
  param_counts_yesterday.event_name = param_counts_friday_average.event_name AND param_counts_yesterday.parameter_name = param_counts_friday_average.parameter_name
ORDER BY 
  param_counts_yesterday.param_count_yesterday DESC;

Query 6: Type parameter checks

We sometimes see issues in the data that affect event parameters of particular types. We can spot any anomalous counts in type parameters by comparing yesterday’s count with an average for the last two weeks (again depending on whether we are looking at a weekend day, a Friday or any other week day). The query is:

WITH type_counts_week_day_average AS (
  SELECT 
    parameter_name,
    type_value,
    ROUND(SUM(daily_count) / COUNT(DISTINCT event_date), 0) AS average_type_count_week_day
  FROM (
    SELECT
      param.key as parameter_name,
      (SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'type') as type_value,
      COUNT(*) AS daily_count,
      event_date
    FROM
      `ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
    WHERE
      event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 11 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
      AND EXTRACT(DAYOFWEEK FROM event_date) BETWEEN 2 AND 5
      AND param.key = 'type'
    GROUP BY
      parameter_name, type_value, event_date
  )
  GROUP BY parameter_name, type_value
),

type_counts_friday_average AS (
  SELECT 
    parameter_name,
    type_value,
    ROUND(SUM(daily_count) / COUNT(DISTINCT event_date), 0) AS average_type_count_friday
  FROM (
    SELECT
      param.key as parameter_name,
      (SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'type') as type_value,
      COUNT(*) AS daily_count,
      event_date
    FROM
      `ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
    WHERE
      event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 11 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
      AND EXTRACT(DAYOFWEEK FROM event_date) = 6
      AND param.key = 'type'
    GROUP BY
      parameter_name, type_value, event_date
  )
  GROUP BY parameter_name, type_value
),

type_counts_weekend_average AS (
  SELECT 
    parameter_name,
    type_value,
    ROUND(SUM(daily_count) / COUNT(DISTINCT event_date), 0) AS average_type_count_weekend
  FROM (
    SELECT
      param.key as parameter_name,
      (SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'type') as type_value,
      COUNT(*) AS daily_count,
      event_date
    FROM
      `ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
    WHERE
      event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 11 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
      AND EXTRACT(DAYOFWEEK FROM event_date) IN (1, 7)
      AND param.key = 'type'
    GROUP BY
      parameter_name, type_value, event_date
  )
  GROUP BY parameter_name, type_value
),

type_counts_yesterday AS (
  SELECT
    param.key as parameter_name,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'type') as type_value,
    COUNT(*) AS type_count_yesterday
  FROM
    `ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
  WHERE
    event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
    AND param.key = 'type'
  GROUP BY
    parameter_name, type_value
)

SELECT
  type_counts_yesterday.parameter_name,
  type_counts_yesterday.type_value,
  type_counts_yesterday.type_count_yesterday,
  CASE 
    WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) = 6
    THEN type_counts_friday_average.average_type_count_friday 
    WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) BETWEEN 2 AND 5 
    THEN type_counts_week_day_average.average_type_count_week_day
    ELSE type_counts_weekend_average.average_type_count_weekend
  END AS average_type_count
FROM 
  type_counts_yesterday
LEFT JOIN 
  type_counts_week_day_average 
ON 
  type_counts_yesterday.type_value = type_counts_week_day_average.type_value 
  AND type_counts_yesterday.parameter_name = type_counts_week_day_average.parameter_name
LEFT JOIN 
  type_counts_weekend_average 
ON 
  type_counts_yesterday.type_value = type_counts_weekend_average.type_value 
  AND type_counts_yesterday.parameter_name = type_counts_weekend_average.parameter_name
 LEFT JOIN 
  type_counts_friday_average
ON  
  type_counts_yesterday.type_value = type_counts_friday_average.type_value 
  AND type_counts_yesterday.parameter_name = type_counts_friday_average.parameter_name
ORDER BY 
  type_counts_yesterday.type_count_yesterday DESC;

Step 2: Connect data to Google Sheets

The results from these six queries are output into six tables in BigQuery. These tables are then imported into a Google sheet with six tabs, one for each BigQuery table using the BigQuery data connector. This is set to refresh once a day, grabbing the new table data once the queries have been run in BigQuery.

An example of what this sheet looks like can be found here

You will notice that in a number of the tabs there are columns in highlighted colours. These columns contain formulas which identify % change difference between the results from the BigQuery exports, or other formulas such as VLOOKUPS to identify whether or not events/parameters have already been identified/flagged as anomalous in other tabs. These calculated columns are used by Apps Script to build the alerts email.

Step 3: Creating a Google Sheets Apps Script

Finally, there is an Apps Script that runs in the Google sheet on a schedule between 12pm and 1pm each day. The Apps Script interprets the results from the six query tables in the Google sheet, and formats them into an email which is sent to the GOVUK GA4 Alerts Google group.

The email itself is divided into four sections pertaining to different elements of the GA4 data we want to monitor: 1) Basic checks 2) Event checks 3) Event parameter checks and 4) Type parameter checks. Each section will return different messages depending on certain thresholds and conditions met in the output of the query tables.

Here is a basic summary of what it does by outline the checks and conditions for each step:

GA4 Basic Data Checks:

Condition: Checks if the GA4 data is present in the flattened partitioned table (Column B). Threshold: Triggers an alert if the value in cell B2 is “No,” indicating missing data.

Event Count Change:

Condition: Compares event count change (Column E). Threshold: Triggers an alert if the change is greater than ±5% (e.g., > 0.05 or < -0.05).

Event Appearance/Disappearance:

Condition: Checks for new or disappeared events in the event data checks sheet. Threshold: Alerts if events marked as “New” or “Disappeared” are detected.

Event Count Average Checks:

Condition: Analyzes individual event counts (Column D) for significant percentage changes. Threshold: Triggers if: Change exceeds ±15% compared to averages and yesterday’s or two days ago’s count is greater than 10,000.

Event Parameter Appearance/Disappearance:

Condition: Identifies new or disappeared parameters for events. Threshold: Alerts if: New parameters with counts >100 appear or disappeared parameters with counts >100 are found.

Parameter Count Change:

Condition: Examines parameter count changes (Column E). Threshold: Triggers if: Change exceeds ±20% and yesterday’s or two days ago’s count >10,000.

Type Parameter Appearance/Disappearance:

Condition: Checks for new or disappeared type parameters. Threshold: Alerts if any new or disappeared type parameters are detected.

Type Parameter Count Change:

Condition: Analyzes type parameter count changes (Column E). Threshold: Triggers if: Change exceeds ±20% and yesterday’s or two days ago’s count >1,000.

The Apps Script is written in javascript and the full code is as follows:

function checkEventCountChange() {
  // Define the sheets and email details
  const spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
  const ga4DataSheet = spreadsheet.getSheetByName("ga4_data_checks_sheet");
  const ga4EventSheet = spreadsheet.getSheetByName("ga4_event_data_checks_sheet");
  const ga4EventCountAverageSheet = spreadsheet.getSheetByName("ga4_event_count_average_checks_sheet");
  const ga4EventParamCheckSheet = spreadsheet.getSheetByName("ga4_event_param_check_sheet");
  const ga4ParamCountAverageSheet = spreadsheet.getSheetByName("ga4_param_count_average");
  const ga4TypeParamAveragesSheet = spreadsheet.getSheetByName("ga4_type_param_averages");

  // Define the email addresses (replace with the actual email addresses)
  const primaryEmail = "edward.billett@digital.cabinet-office.gov.uk";
  const secondaryEmail1 = "govuk-ga4-alerts@digital.cabinet-office.gov.uk";
  const emailRecipients = `${primaryEmail},${secondaryEmail1}`;

  // Retrieve the date from cell A2 of ga4_data_checks_sheet and format it
  const dateValue = ga4DataSheet.getRange("A2").getValue();
  const formattedDateDisplay = Utilities.formatDate(new Date(dateValue), Session.getScriptTimeZone(), "MMM dd yyyy");
  const formattedDateSubject = Utilities.formatDate(new Date(dateValue), Session.getScriptTimeZone(), "yyyy-MM-dd");

  // Start building the email content with the date information at the top, followed by GA4 Basic Data Checks section
  let messageBody = `<p><b>The below checks are based on data for: ${formattedDateDisplay}</b></p><br><b><u>GA4 Basic Data Checks</u></b><br><br><ol>`;

  // Check column B in GA4 Data Checks sheet for 'Yes' or 'No'
  const columnB = ga4DataSheet.getRange("B2:B").getValues();
  if (columnB[0][0] === 'Yes') {
    messageBody += "<li>✔️ Yesterday's GA4 data is present in the flattened partitioned table.</li>";
  } else {
    messageBody += "<li>❌ Yesterday's flattened partitioned GA4 data is missing.</li>";

    // Send an email with only this alert and exit the function
    MailApp.sendEmail({
      to: emailRecipients, // Sends to all three recipients
      subject: `Data for ${formattedDateSubject} GA4 Data Check Alert: Missing Flattened Data`,
      htmlBody: messageBody
    });
    return; // Exit the function without continuing
  }

// Check column E in GA4 Data Checks sheet for values > 5% or < -5%
const columnE = ga4DataSheet.getRange("E2:E").getValues();
let alertTriggered = false;

// Get the value from cell E2 and format it as a percentage
const e2Value = ga4DataSheet.getRange("E2").getValue();
const formattedE2Value = (e2Value * 100).toFixed(1) + "%"; // Format as percentage with 1 decimal place

columnE.forEach(row => {
  const value = parseFloat(row[0]);
  if (!isNaN(value) && (value > 0.05 || value < -0.05)) {
    alertTriggered = true;
  }
});

if (alertTriggered) {
  messageBody += `<li>❌ Oh no! The change in event count between yesterday and previous days is greater than 5% (${formattedE2Value}).</li>`;
} else {
  messageBody += "<li>✔️ Yesterday's total event count compared with previous days is within normal bounds.</li>";
}

messageBody += "</ol><br><b><u>GA4 Event Data Checks</u></b><br><br><ol>";

  // GA4 Event Data Checks for ga4_event_data_checks_sheet
  const columnAEvent = ga4EventSheet.getRange("A2:A").getValues();
  const columnDEvent = ga4EventSheet.getRange("D2:D").getValues();

  let disappearedEvents = [];
  let newEvents = [];

  columnDEvent.forEach((row, index) => {
    if (row[0] === 'Disappeared') {
      disappearedEvents.push(columnAEvent[index][0]);
    } else if (row[0] === 'New') {
      newEvents.push(columnAEvent[index][0]);
    }
  });

 if (disappearedEvents.length > 0) {
    messageBody += `<li>❌ The following events were not seen in yesterday's data, but were seen two days ago:<ul>`;
    disappearedEvents.forEach(event => messageBody += `<li>${event}</li>`);
    messageBody += "</ul></li>";
  } else if (newEvents.length > 0) {
    messageBody += `<li>❌ The following events were seen in yesterday's data, but were not seen two days ago:<ul>`;
    newEvents.forEach(event => messageBody += `<li>${event}</li>`);
    messageBody += "</ul></li>";
  } else {
    messageBody += "<li>✔️ No new events detected between yesterday and two days ago. No events have disappeared between yesterday and two days ago.</li>";
  }

  // GA4 Event Count Average Checks for ga4_event_count_average_checks_sheet
  const eventCountColumnA = ga4EventCountAverageSheet.getRange("A2:A").getValues();
  const eventCountColumnB = ga4EventCountAverageSheet.getRange("B2:B").getValues();
  const eventCountColumnC = ga4EventCountAverageSheet.getRange("C2:C").getValues();
  const eventCountColumnD = ga4EventCountAverageSheet.getRange("D2:D").getValues();

  let outlierEvents = [];

  eventCountColumnD.forEach((row, index) => {
    const percentageChange = parseFloat(row[0]);
    const yesterdayCount = parseFloat(eventCountColumnB[index][0]);
    const twoDaysAgoCount = parseFloat(eventCountColumnC[index][0]);
    const eventName = eventCountColumnA[index][0];

    if (!isNaN(percentageChange) && (percentageChange > 0.15 || percentageChange < -0.15) && (yesterdayCount > 10000 || twoDaysAgoCount > 10000)) {
      outlierEvents.push(eventName);
    }
  });

if (outlierEvents.length > 0) {
  messageBody += `<li>❌ The following events have seen an event count increase/decrease of more than 15% compared with average counts. Consider investigating:<ul>`;
  outlierEvents.forEach(event => {
    const eventIndex = eventCountColumnA.findIndex(row => row[0] === event);
    const percentageChange = eventCountColumnD[eventIndex][0].toLocaleString('en-US', { style: 'percent', minimumFractionDigits: 2 });
    messageBody += `<li>${event} (${percentageChange})</li>`;
  });
  messageBody += "</ul></li>";
} else {
  messageBody += "<li>✔️ Yesterday's individual event counts compared with previous days are within normal bounds</li>";
}

// GA4 Event Parameter Checks for ga4_event_param_check_sheet
messageBody += "</ol><br><b><u>GA4 Event Parameter Checks</u></b><br><br><ol>";

const paramCheckColumnA = ga4EventParamCheckSheet.getRange("A2:A").getValues();
const paramCheckColumnB = ga4EventParamCheckSheet.getRange("B2:B").getValues();
const paramCheckColumnC = ga4EventParamCheckSheet.getRange("C2:C").getValues();
const paramCheckColumnD = ga4EventParamCheckSheet.getRange("D2:D").getValues();
const paramCheckColumnE = ga4EventParamCheckSheet.getRange("E2:E").getValues();
const paramCheckColumnF = ga4EventParamCheckSheet.getRange("F2:F").getValues(); // Column F values

let newParams = {};
let disappearedParams = {};

paramCheckColumnE.forEach((row, index) => {
  const status = row[0];
  const paramName = paramCheckColumnB[index][0];
  const eventName = paramCheckColumnA[index][0];
  const countC = parseFloat(paramCheckColumnC[index][0]);
  const countD = parseFloat(paramCheckColumnD[index][0]);
  const statusF = paramCheckColumnF[index][0]; // Get the value from column F

  // Only process if status is not 'New' or 'Disappeared' in column F
  if (statusF !== 'New' && statusF !== 'Disappeared') {
    if (status === 'New' && (countC > 100 || countD > 100)) {
      if (!newParams[eventName]) newParams[eventName] = [];
      newParams[eventName].push(paramName);
    } else if (status === 'Disappeared' && (countC > 100 || countD > 100)) {
      if (!disappearedParams[eventName]) disappearedParams[eventName] = [];
      disappearedParams[eventName].push(paramName);
    }
  }
});

if (Object.keys(newParams).length > 0) {
  messageBody += "<li>❌ New parameters detected:<ul>";
  for (const [event, params] of Object.entries(newParams)) {
    messageBody += `<li>${event}<ul>`;
    params.forEach(param => messageBody += `<li>${param}</li>`);
    messageBody += "</ul></li>";
  }
  messageBody += "</ul></li>";
} else if (Object.keys(disappearedParams).length > 0) {
  messageBody += "<li>❌ Missing parameters:<ul>";
  for (const [event, params] of Object.entries(disappearedParams)) {
    messageBody += `<li>${event}<ul>`;
    params.forEach(param => messageBody += `<li>${param}</li>`);
    messageBody += "</ul></li>";
  }
  messageBody += "</ul></li>";
} else {
  messageBody += "<li>✔️ No new or missing event parameters returned in yesterday's data above a significant threshold</li>";
}

 // Check ga4_param_count_average for parameter count changes
const paramCountColumnA = ga4ParamCountAverageSheet.getRange("A2:A").getValues();
const paramCountColumnB = ga4ParamCountAverageSheet.getRange("B2:B").getValues();
const paramCountColumnC = ga4ParamCountAverageSheet.getRange("C2:C").getValues();
const paramCountColumnD = ga4ParamCountAverageSheet.getRange("D2:D").getValues();
const paramCountColumnE = ga4ParamCountAverageSheet.getRange("E2:E").getValues();
const paramCountColumnF = ga4ParamCountAverageSheet.getRange("F2:F").getValues(); // New column F check

let outlierParamEvents = {};

paramCountColumnE.forEach((row, index) => {
  const percentageChange = parseFloat(row[0]);
  const yesterdayCount = parseFloat(paramCountColumnC[index][0]);
  const twoDaysAgoCount = parseFloat(paramCountColumnD[index][0]);
  const eventName = paramCountColumnA[index][0];
  const paramName = paramCountColumnB[index][0];
  const statusF = paramCountColumnF[index][0]; // Get the value from column F

  // Check if column F is not 'Yes' before processing
  if (statusF !== 'Yes' && !isNaN(percentageChange) && (percentageChange > 0.20 || percentageChange < -0.20) && (yesterdayCount > 10000 || twoDaysAgoCount > 10000)) {
    if (!outlierParamEvents[eventName]) outlierParamEvents[eventName] = [];
    outlierParamEvents[eventName].push(paramName);
  }
});

if (Object.keys(outlierParamEvents).length > 0) {
  messageBody += "<li>❌ Change in event parameter count! There was a change of more than 20% in the counts for the below event parameters:<ol>";
  for (const [event, params] of Object.entries(outlierParamEvents)) {
    messageBody += `<li>${event}<ul>`;
    params.forEach(param => messageBody += `<li>${param}</li>`);
    messageBody += "</ul></li>";
  }
  messageBody += "</ol></li>";
} else {
  messageBody += "<li>✔️ Yesterday's event parameter counts compared with previous days is within normal bounds.</li>";
}

messageBody += "</ol>"; // Close the ordered list

  // Check ga4_type_param_averages for type parameter changes
messageBody += "<br><b><u>GA4 Type Parameter Checks</u></b><br><br>"; // Add <br> here for line break
  const typeParamColumnB = ga4TypeParamAveragesSheet.getRange("B2:B").getValues();
  const typeParamColumnC = ga4TypeParamAveragesSheet.getRange("C2:C").getValues();
  const typeParamColumnD = ga4TypeParamAveragesSheet.getRange("D2:D").getValues();
  const typeParamColumnE = ga4TypeParamAveragesSheet.getRange("E2:E").getValues();

  let outlierTypes = [];

  typeParamColumnE.forEach((row, index) => {
    const percentageChange = parseFloat(row[0]);
    const yesterdayCount = parseFloat(typeParamColumnC[index][0]);
    const twoDaysAgoCount = parseFloat(typeParamColumnD[index][0]);
    const typeName = typeParamColumnB[index][0];

    if (!isNaN(percentageChange) && (percentageChange > 0.20 || percentageChange < -0.20) && (yesterdayCount > 1000 || twoDaysAgoCount > 1000)) {
      outlierTypes.push(typeName);
    }
  });

if (typeParamColumnE.some(row => row[0] === 'New') || typeParamColumnE.some(row => row[0] === 'Disappeared')) {
  // Check for new type parameters
  const newTypes = typeParamColumnE
    .map((row, index) => ({ typeName: typeParamColumnB[index][0], status: row[0] }))
    .filter(row => row.status === 'New')
    .map(row => row.typeName);

  if (newTypes.length > 0) {
    messageBody += `<li>❌ Oh wow! A new type parameter has appeared in yesterday's data. It's called:<ul>`;
    newTypes.forEach(newType => {
      messageBody += `<li>${newType}</li>`;
    });
    messageBody += "</ul></li>";
  }

  // Check for disappeared type parameters
  const disappearedTypes = typeParamColumnE
    .map((row, index) => ({ typeName: typeParamColumnB[index][0], status: row[0] }))
    .filter(row => row.status === 'Disappeared')
    .map(row => row.typeName);

  if (disappearedTypes.length > 0) {
    messageBody += `<li>❌ Oh no! A type parameter has disappeared from yesterday's data. It was:<ul>`;
    disappearedTypes.forEach(disappearedType => {
      messageBody += `<li>${disappearedType}</li>`;
    });
    messageBody += "</ul></li>";
  }
}

if (outlierTypes.length > 0) {
  messageBody += `<li>❌ Change in 'type' parameter count! The change in type parameter count between yesterday and previous days for certain types is greater than 20%. They are:<ul>`;
  outlierTypes.forEach(type => {
    const typeIndex = typeParamColumnB.findIndex(row => row[0] === type);
    const percentageChange = typeParamColumnE[typeIndex][0].toLocaleString('en-US', { style: 'percent', minimumFractionDigits: 2 });
    messageBody += `<li>${type} (${percentageChange})</li>`;
  });
  messageBody += "</ul></li>";
} else {
  messageBody += "<li>✔️ Yesterday's type parameter counts compared with previous days are within normal bounds.</li>";
}

  // Close the ordered list tags
  messageBody += "</ol>";

  // Send the email with the final message
    MailApp.sendEmail({
    to: emailRecipients,
    subject: `GA4 Data Check Alert for ${formattedDateSubject}`,
    htmlBody: messageBody
  });
}

Verifying and investigating data alerts

The outputs from the alerts process are sent to the GOVUK GA4 Alerts Google group. If there are any flags in the data, then a member of this group can begin to investigate the issue(s). Not all flags raised by the alert process will warrant investigating, however. For example, if a rise in the count of ab_test event parameters is flagged, this might be expected behaviour if an AB test somewhere on GOV.UK has recently begun or concluded.

The output of this alert can be cross referenced with the GOV.UK GA4 data quality report which can be found here. This report visualises GA4 events, event type parameters and statistics by page locations over time. By using this report in combination with the alerts process, we should be able to verify and narrow down anomalies in the data, viewed in the context of longer time periods.

Get access to the GOV.UK GA4 Data Alerts Output

Those who would like to receive these alerts can request to join the GOVUK GA4 Alerts Google group. Please get in touch with the GOV.UK Insights and Analytics team if you would like adding to this list.

In the future we would like to make a web page or dashboard where we can present this information in some kind of ‘live status’ page, though this will probably come later once we have refined the data outputs in the email alert.

This page was last reviewed on 26 November 2024. It needs to be reviewed again on 26 May 2025 .
This page was set to be reviewed before 26 May 2025. This might mean the content is out of date.