GOV.UK GA4 Data Alert Processes
There are a number of ways we monitor the quality and integrity of the GOV.UK GA4 data. These include basic alerts which tell us whether or not yesterday’s data has arrived and has been processed in BigQuery, as well as a number of data quality/annotation dashboards.
This page focuses on a separate, more detailed data alerts process which notifies us of more specific changes to the GOV.UK GA4, and whether or not these changes fall outside of expected bounds.
The check runs each day, examining the previous day’s data and sends the results by email to the GOVUK GA4 Alerts Google group.
How has the alerts process been set up?
The data alert check is a multi step process, starting with running queries against the data in BigQuery. The results of these queries are then exported into Google sheets, and from here an Apps Script runs which sends alerts based on particular thresholds/parameters.
Here is the process is more detail:
Step 1: BigQuery
There are six queries which run each day to check yesterday’s GOV.UK GA4 data. They query either the flattened partitioned table, or the partitioned nested tables (depending on the query).
They run as scheduled queries in the gds-bq-reporting
project, and each output results to tables which are overwritten each day.
The queries are as following
Query 1: Basic checks
This query simply outputs yesterday’s event date, and whether or not it exists in the partitioned flattened table. It also calculates the previous day’s event count, and another column with the average event count over the last two weeks for either a weekend day, a friday, or any other week day for comparison. The query is:
-- yesterday exists checker
with yesterday_check as (
SELECT
DISTINCT
event_date,
CASE
WHEN event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
THEN 'Yes' ELSE 'No'
END AS date_exists
FROM
`ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE
event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
),
- yesterday total event count
total_event_count AS (
SELECT
count (*) AS date_event_count
FROM
`ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE
event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)),
-- average event count last two weeks for week days
week_day_average_event_count AS (
SELECT
ROUND(AVG(event_count),0) AS average_event_count_weekday
FROM (
SELECT
count (*) AS event_count
FROM
`ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE
event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 14 DAY) AND CURRENT_DATE()
AND EXTRACT(DAYOFWEEK FROM event_date) BETWEEN 2 AND 5
GROUP BY
event_date ) ),
-- average event count last two weeks for weekends
weekend_average_event_count as (
SELECT
ROUND(AVG(event_count),0) AS average_event_count_weekend
FROM (
SELECT
count (*) AS event_count
FROM
`ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE
event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 14 DAY) AND CURRENT_DATE()
AND (EXTRACT(DAYOFWEEK FROM event_date) = 1 OR EXTRACT(DAYOFWEEK FROM event_date) = 7)
GROUP BY
event_date)),
friday_average_event_count as (
SELECT
ROUND(AVG(event_count),0) AS average_event_count_friday
FROM (
SELECT
count (*) AS event_count
FROM
`ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE
event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 14 DAY) AND CURRENT_DATE()
AND (EXTRACT(DAYOFWEEK FROM event_date) = 6)
GROUP BY
event_date))
-- selects the event date, whether or not it exists, the count of events, and the average count of events over the last two weeks
SELECT
yesterday_check.event_date,
yesterday_check.date_exists,
total_event_count.date_event_count,
CASE
WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) = 6
THEN friday_average_event_count.average_event_count_friday
WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) BETWEEN 2 AND 5
THEN week_day_average_event_count.average_event_count_weekday
ELSE weekend_average_event_count.average_event_count_weekend
END AS average_event_count_last_two_weeks,
FROM
total_event_count,
week_day_average_event_count,
weekend_average_event_count,
friday_average_event_count,
yesterday_check
Query 2: Event data checks
This query looks at the distinct event names which appeared in the data for the previous day, and compares those to the events which appeared in the GA4 data two days earlier. This query helps us to identify whether events have appeared or disappeared unexpectedly between any two days. The query is:
WITH event_counts_two_days_ago AS (
SELECT
event_name AS distinct_event_name,
COUNT(*) AS event_count_two_days_ago
FROM
`ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE
--event_date = '2024-04-07'
event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
GROUP BY
event_name
),
event_counts_yesterday AS (
SELECT
event_name AS distinct_event_name,
COUNT(*) AS event_count_yesterday
FROM
`ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE
--event_date = '2024-04-08'
event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
GROUP BY
event_name
)
SELECT
COALESCE(yesterday.distinct_event_name, two_days_ago.distinct_event_name) AS event_name,
COALESCE(event_count_yesterday, 0) AS event_count_yesterday,
COALESCE(event_count_two_days_ago, 0) AS event_count_two_days_ago,
CASE
WHEN event_count_yesterday IS NULL THEN 'Disappeared'
WHEN event_count_two_days_ago IS NULL THEN 'New'
ELSE 'Exists'
END AS event_check
FROM
event_counts_yesterday AS yesterday
FULL OUTER JOIN
event_counts_two_days_ago AS two_days_ago
ON
yesterday.distinct_event_name = two_days_ago.distinct_event_name
ORDER BY
event_check, event_name;
Query 3: Event count average checks
This query returns the event count for the previous day, and another column comparing it to the average event count over the last two weeks for either a weekend day, a friday, or any other week day for comparison. The query is:
WITH event_counts_week_day_average AS (
SELECT
event_name,
ROUND(AVG(daily_count), 0) AS average_event_count_week_day
FROM (
SELECT
event_name,
COUNT(*) AS daily_count
FROM
`ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE
event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 16 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
AND EXTRACT(DAYOFWEEK FROM event_date) BETWEEN 2 AND 5
GROUP BY
event_name, event_date
)
GROUP BY event_name
),
event_counts_friday_average AS (
SELECT
event_name,
ROUND(AVG(daily_count), 0) AS average_event_count_friday
FROM (
SELECT
event_name,
COUNT(*) AS daily_count
FROM
`ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE
event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 16 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
AND EXTRACT(DAYOFWEEK FROM event_date) = 6
GROUP BY
event_name, event_date
)
GROUP BY event_name
),
event_counts_weekend_average AS (
SELECT
event_name,
ROUND(AVG(daily_count), 0) AS average_event_count_weekend
FROM (
SELECT
event_name,
COUNT(*) AS daily_count
FROM
`ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE
event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 16 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
AND (EXTRACT(DAYOFWEEK FROM event_date) = 1 OR EXTRACT(DAYOFWEEK FROM event_date) = 7)
GROUP BY
event_name, event_date
)
GROUP BY event_name
),
event_counts_yesterday AS (
SELECT
event_name,
COUNT(*) AS event_count_yesterday
FROM
`ga4-analytics-352613.flattened_dataset.partitioned_flattened_events`
WHERE
event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
GROUP BY
event_name
)
SELECT
event_counts_yesterday.event_name,
event_counts_yesterday.event_count_yesterday,
CASE
WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) = 6
THEN event_counts_friday_average.average_event_count_friday
WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) BETWEEN 2 AND 5
THEN event_counts_week_day_average.average_event_count_week_day
ELSE event_counts_weekend_average.average_event_count_weekend
END AS average_event_count
FROM
event_counts_yesterday
LEFT JOIN
event_counts_week_day_average
ON
event_counts_yesterday.event_name = event_counts_week_day_average.event_name
LEFT JOIN
event_counts_weekend_average
ON
event_counts_yesterday.event_name = event_counts_weekend_average.event_name
LEFT JOIN event_counts_friday_average
ON
event_counts_yesterday.event_name = event_counts_friday_average.event_name
ORDER BY
event_counts_yesterday.event_count_yesterday DESC;
Query 4: Event parameter checks
This query returns the count of parameters by event seen in yesterday’s data, and for comparison, the equivalent counts for data from the day before yesterday. This is so we can identify where event parameters appear or disappear day to day where we are not expecting this to happen. The query is:
WITH yesterday_event_params AS (
SELECT
event_name,
param.key AS parameter_name,
COUNT(*) AS count_params_yesterday
FROM
`ga4-analytics-352613.analytics_330577055.partitioned_events`,
UNNEST(event_params) AS param
WHERE
event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
GROUP BY
1, 2
),
two_days_ago_event_params AS (
SELECT
event_name,
param.key AS parameter_name,
COUNT(*) AS count_params_two_days_ago
FROM
`ga4-analytics-352613.analytics_330577055.partitioned_events`,
UNNEST(event_params) AS param
WHERE
event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
GROUP BY
1, 2
)
SELECT
COALESCE(yesterday.event_name, two_days_ago.event_name) AS event_name,
COALESCE(yesterday.parameter_name, two_days_ago.parameter_name) AS parameter_name,
COALESCE(count_params_yesterday, 0) AS param_count_yesterday,
COALESCE(count_params_two_days_ago, 0) AS count_params_two_days_ago,
CASE
WHEN count_params_yesterday IS NULL THEN 'Disappeared'
WHEN count_params_two_days_ago IS NULL THEN 'New'
ELSE 'Exists'
END AS param_check
FROM
yesterday_event_params AS yesterday
FULL OUTER JOIN
two_days_ago_event_params AS two_days_ago
ON
yesterday.event_name = two_days_ago.event_name
AND yesterday.parameter_name = two_days_ago.parameter_name
Query 5: Event parameter average count checks
This query again counts parameters by event, but then also returns an average count for the last two weeks in an additional column (depending on if we are looking at a weekend day, a friday or any other week day). This way we can spot any anomalous event parameter counts. The query is:
WITH param_counts_week_day_average AS (
SELECT
event_name,
parameter_name,
ROUND(AVG(daily_count), 0) AS average_param_count_week_day
FROM (
SELECT
event_name,
param.key as parameter_name,
COUNT(*) AS daily_count
FROM
`ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
WHERE
event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 16 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
AND EXTRACT(DAYOFWEEK FROM event_date) BETWEEN 2 AND 5
GROUP BY
event_name, event_date, param.key
)
GROUP BY event_name, parameter_name
),
param_counts_friday_average AS (
SELECT
event_name,
parameter_name,
ROUND(AVG(daily_count), 0) AS average_param_count_friday
FROM (
SELECT
event_name,
param.key as parameter_name,
COUNT(*) AS daily_count
FROM
`ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
WHERE
event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 16 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
AND EXTRACT(DAYOFWEEK FROM event_date) = 6
GROUP BY
event_name, event_date, param.key
)
GROUP BY event_name, parameter_name
),
param_counts_weekend_average AS (
SELECT
event_name,
parameter_name,
ROUND(AVG(daily_count), 0) AS average_param_count_weekend
FROM (
SELECT
event_name,
param.key as parameter_name,
COUNT(*) AS daily_count
FROM
`ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
WHERE
event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 16 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
AND (EXTRACT(DAYOFWEEK FROM event_date) = 1 OR EXTRACT(DAYOFWEEK FROM event_date) = 7)
GROUP BY
event_name, event_date, param.key
)
GROUP BY event_name, parameter_name
),
param_counts_yesterday AS (
SELECT
event_name,
param.key as parameter_name,
COUNT(*) AS param_count_yesterday
FROM
`ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
WHERE
event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
GROUP BY
event_name, parameter_name
)
SELECT
param_counts_yesterday.event_name,
param_counts_yesterday.parameter_name,
param_counts_yesterday.param_count_yesterday,
CASE
WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) = 6
THEN param_counts_friday_average.average_param_count_friday
WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) BETWEEN 2 AND 5
THEN param_counts_week_day_average.average_param_count_week_day
ELSE param_counts_weekend_average.average_param_count_weekend
END AS average_param_count
FROM
param_counts_yesterday
LEFT JOIN
param_counts_week_day_average
ON
param_counts_yesterday.event_name = param_counts_week_day_average.event_name AND param_counts_yesterday.parameter_name = param_counts_week_day_average.parameter_name
LEFT JOIN
param_counts_weekend_average
ON
param_counts_yesterday.event_name = param_counts_weekend_average.event_name AND param_counts_yesterday.parameter_name = param_counts_weekend_average.parameter_name
LEFT JOIN
param_counts_friday_average
ON
param_counts_yesterday.event_name = param_counts_friday_average.event_name AND param_counts_yesterday.parameter_name = param_counts_friday_average.parameter_name
ORDER BY
param_counts_yesterday.param_count_yesterday DESC;
Query 6: Type parameter checks
We sometimes see issues in the data that affect event parameters of particular types. We can spot any anomalous counts in type parameters by comparing yesterday’s count with an average for the last two weeks (again depending on whether we are looking at a weekend day, a Friday or any other week day). The query is:
WITH type_counts_week_day_average AS (
SELECT
parameter_name,
type_value,
ROUND(SUM(daily_count) / COUNT(DISTINCT event_date), 0) AS average_type_count_week_day
FROM (
SELECT
param.key as parameter_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'type') as type_value,
COUNT(*) AS daily_count,
event_date
FROM
`ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
WHERE
event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 11 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
AND EXTRACT(DAYOFWEEK FROM event_date) BETWEEN 2 AND 5
AND param.key = 'type'
GROUP BY
parameter_name, type_value, event_date
)
GROUP BY parameter_name, type_value
),
type_counts_friday_average AS (
SELECT
parameter_name,
type_value,
ROUND(SUM(daily_count) / COUNT(DISTINCT event_date), 0) AS average_type_count_friday
FROM (
SELECT
param.key as parameter_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'type') as type_value,
COUNT(*) AS daily_count,
event_date
FROM
`ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
WHERE
event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 11 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
AND EXTRACT(DAYOFWEEK FROM event_date) = 6
AND param.key = 'type'
GROUP BY
parameter_name, type_value, event_date
)
GROUP BY parameter_name, type_value
),
type_counts_weekend_average AS (
SELECT
parameter_name,
type_value,
ROUND(SUM(daily_count) / COUNT(DISTINCT event_date), 0) AS average_type_count_weekend
FROM (
SELECT
param.key as parameter_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'type') as type_value,
COUNT(*) AS daily_count,
event_date
FROM
`ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
WHERE
event_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 11 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
AND EXTRACT(DAYOFWEEK FROM event_date) IN (1, 7)
AND param.key = 'type'
GROUP BY
parameter_name, type_value, event_date
)
GROUP BY parameter_name, type_value
),
type_counts_yesterday AS (
SELECT
param.key as parameter_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'type') as type_value,
COUNT(*) AS type_count_yesterday
FROM
`ga4-analytics-352613.analytics_330577055.partitioned_events`, UNNEST(event_params) as param
WHERE
event_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
AND param.key = 'type'
GROUP BY
parameter_name, type_value
)
SELECT
type_counts_yesterday.parameter_name,
type_counts_yesterday.type_value,
type_counts_yesterday.type_count_yesterday,
CASE
WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) = 6
THEN type_counts_friday_average.average_type_count_friday
WHEN EXTRACT(DAYOFWEEK FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) BETWEEN 2 AND 5
THEN type_counts_week_day_average.average_type_count_week_day
ELSE type_counts_weekend_average.average_type_count_weekend
END AS average_type_count
FROM
type_counts_yesterday
LEFT JOIN
type_counts_week_day_average
ON
type_counts_yesterday.type_value = type_counts_week_day_average.type_value
AND type_counts_yesterday.parameter_name = type_counts_week_day_average.parameter_name
LEFT JOIN
type_counts_weekend_average
ON
type_counts_yesterday.type_value = type_counts_weekend_average.type_value
AND type_counts_yesterday.parameter_name = type_counts_weekend_average.parameter_name
LEFT JOIN
type_counts_friday_average
ON
type_counts_yesterday.type_value = type_counts_friday_average.type_value
AND type_counts_yesterday.parameter_name = type_counts_friday_average.parameter_name
ORDER BY
type_counts_yesterday.type_count_yesterday DESC;
Step 2: Connect data to Google Sheets
The results from these six queries are output into six tables in BigQuery. These tables are then imported into a Google sheet with six tabs, one for each BigQuery table using the BigQuery data connector. This is set to refresh once a day, grabbing the new table data once the queries have been run in BigQuery.
An example of what this sheet looks like can be found here
You will notice that in a number of the tabs there are columns in highlighted colours. These columns contain formulas which identify % change difference between the results from the BigQuery exports, or other formulas such as VLOOKUPS to identify whether or not events/parameters have already been identified/flagged as anomalous in other tabs. These calculated columns are used by Apps Script to build the alerts email.
Step 3: Creating a Google Sheets Apps Script
Finally, there is an Apps Script that runs in the Google sheet on a schedule between 12pm and 1pm each day. The Apps Script interprets the results from the six query tables in the Google sheet, and formats them into an email which is sent to the GOVUK GA4 Alerts Google group.
The email itself is divided into four sections pertaining to different elements of the GA4 data we want to monitor: 1) Basic checks 2) Event checks 3) Event parameter checks and 4) Type parameter checks. Each section will return different messages depending on certain thresholds and conditions met in the output of the query tables.
Here is a basic summary of what it does by outline the checks and conditions for each step:
GA4 Basic Data Checks:
Condition: Checks if the GA4 data is present in the flattened partitioned table (Column B). Threshold: Triggers an alert if the value in cell B2 is “No,” indicating missing data.
Event Count Change:
Condition: Compares event count change (Column E). Threshold: Triggers an alert if the change is greater than ±5% (e.g., > 0.05 or < -0.05).
Event Appearance/Disappearance:
Condition: Checks for new or disappeared events in the event data checks sheet. Threshold: Alerts if events marked as “New” or “Disappeared” are detected.
Event Count Average Checks:
Condition: Analyzes individual event counts (Column D) for significant percentage changes. Threshold: Triggers if: Change exceeds ±15% compared to averages and yesterday’s or two days ago’s count is greater than 10,000.
Event Parameter Appearance/Disappearance:
Condition: Identifies new or disappeared parameters for events. Threshold: Alerts if: New parameters with counts >100 appear or disappeared parameters with counts >100 are found.
Parameter Count Change:
Condition: Examines parameter count changes (Column E). Threshold: Triggers if: Change exceeds ±20% and yesterday’s or two days ago’s count >10,000.
Type Parameter Appearance/Disappearance:
Condition: Checks for new or disappeared type parameters. Threshold: Alerts if any new or disappeared type parameters are detected.
Type Parameter Count Change:
Condition: Analyzes type parameter count changes (Column E). Threshold: Triggers if: Change exceeds ±20% and yesterday’s or two days ago’s count >1,000.
The Apps Script is written in javascript and the full code is as follows:
function checkEventCountChange() {
// Define the sheets and email details
const spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
const ga4DataSheet = spreadsheet.getSheetByName("ga4_data_checks_sheet");
const ga4EventSheet = spreadsheet.getSheetByName("ga4_event_data_checks_sheet");
const ga4EventCountAverageSheet = spreadsheet.getSheetByName("ga4_event_count_average_checks_sheet");
const ga4EventParamCheckSheet = spreadsheet.getSheetByName("ga4_event_param_check_sheet");
const ga4ParamCountAverageSheet = spreadsheet.getSheetByName("ga4_param_count_average");
const ga4TypeParamAveragesSheet = spreadsheet.getSheetByName("ga4_type_param_averages");
// Define the email addresses (replace with the actual email addresses)
const primaryEmail = "edward.billett@digital.cabinet-office.gov.uk";
const secondaryEmail1 = "govuk-ga4-alerts@digital.cabinet-office.gov.uk";
const emailRecipients = `${primaryEmail},${secondaryEmail1}`;
// Retrieve the date from cell A2 of ga4_data_checks_sheet and format it
const dateValue = ga4DataSheet.getRange("A2").getValue();
const formattedDateDisplay = Utilities.formatDate(new Date(dateValue), Session.getScriptTimeZone(), "MMM dd yyyy");
const formattedDateSubject = Utilities.formatDate(new Date(dateValue), Session.getScriptTimeZone(), "yyyy-MM-dd");
// Start building the email content with the date information at the top, followed by GA4 Basic Data Checks section
let messageBody = `<p><b>The below checks are based on data for: ${formattedDateDisplay}</b></p><br><b><u>GA4 Basic Data Checks</u></b><br><br><ol>`;
// Check column B in GA4 Data Checks sheet for 'Yes' or 'No'
const columnB = ga4DataSheet.getRange("B2:B").getValues();
if (columnB[0][0] === 'Yes') {
messageBody += "<li>✔️ Yesterday's GA4 data is present in the flattened partitioned table.</li>";
} else {
messageBody += "<li>❌ Yesterday's flattened partitioned GA4 data is missing.</li>";
// Send an email with only this alert and exit the function
MailApp.sendEmail({
to: emailRecipients, // Sends to all three recipients
subject: `Data for ${formattedDateSubject} GA4 Data Check Alert: Missing Flattened Data`,
htmlBody: messageBody
});
return; // Exit the function without continuing
}
// Check column E in GA4 Data Checks sheet for values > 5% or < -5%
const columnE = ga4DataSheet.getRange("E2:E").getValues();
let alertTriggered = false;
// Get the value from cell E2 and format it as a percentage
const e2Value = ga4DataSheet.getRange("E2").getValue();
const formattedE2Value = (e2Value * 100).toFixed(1) + "%"; // Format as percentage with 1 decimal place
columnE.forEach(row => {
const value = parseFloat(row[0]);
if (!isNaN(value) && (value > 0.05 || value < -0.05)) {
alertTriggered = true;
}
});
if (alertTriggered) {
messageBody += `<li>❌ Oh no! The change in event count between yesterday and previous days is greater than 5% (${formattedE2Value}).</li>`;
} else {
messageBody += "<li>✔️ Yesterday's total event count compared with previous days is within normal bounds.</li>";
}
messageBody += "</ol><br><b><u>GA4 Event Data Checks</u></b><br><br><ol>";
// GA4 Event Data Checks for ga4_event_data_checks_sheet
const columnAEvent = ga4EventSheet.getRange("A2:A").getValues();
const columnDEvent = ga4EventSheet.getRange("D2:D").getValues();
let disappearedEvents = [];
let newEvents = [];
columnDEvent.forEach((row, index) => {
if (row[0] === 'Disappeared') {
disappearedEvents.push(columnAEvent[index][0]);
} else if (row[0] === 'New') {
newEvents.push(columnAEvent[index][0]);
}
});
if (disappearedEvents.length > 0) {
messageBody += `<li>❌ The following events were not seen in yesterday's data, but were seen two days ago:<ul>`;
disappearedEvents.forEach(event => messageBody += `<li>${event}</li>`);
messageBody += "</ul></li>";
} else if (newEvents.length > 0) {
messageBody += `<li>❌ The following events were seen in yesterday's data, but were not seen two days ago:<ul>`;
newEvents.forEach(event => messageBody += `<li>${event}</li>`);
messageBody += "</ul></li>";
} else {
messageBody += "<li>✔️ No new events detected between yesterday and two days ago. No events have disappeared between yesterday and two days ago.</li>";
}
// GA4 Event Count Average Checks for ga4_event_count_average_checks_sheet
const eventCountColumnA = ga4EventCountAverageSheet.getRange("A2:A").getValues();
const eventCountColumnB = ga4EventCountAverageSheet.getRange("B2:B").getValues();
const eventCountColumnC = ga4EventCountAverageSheet.getRange("C2:C").getValues();
const eventCountColumnD = ga4EventCountAverageSheet.getRange("D2:D").getValues();
let outlierEvents = [];
eventCountColumnD.forEach((row, index) => {
const percentageChange = parseFloat(row[0]);
const yesterdayCount = parseFloat(eventCountColumnB[index][0]);
const twoDaysAgoCount = parseFloat(eventCountColumnC[index][0]);
const eventName = eventCountColumnA[index][0];
if (!isNaN(percentageChange) && (percentageChange > 0.15 || percentageChange < -0.15) && (yesterdayCount > 10000 || twoDaysAgoCount > 10000)) {
outlierEvents.push(eventName);
}
});
if (outlierEvents.length > 0) {
messageBody += `<li>❌ The following events have seen an event count increase/decrease of more than 15% compared with average counts. Consider investigating:<ul>`;
outlierEvents.forEach(event => {
const eventIndex = eventCountColumnA.findIndex(row => row[0] === event);
const percentageChange = eventCountColumnD[eventIndex][0].toLocaleString('en-US', { style: 'percent', minimumFractionDigits: 2 });
messageBody += `<li>${event} (${percentageChange})</li>`;
});
messageBody += "</ul></li>";
} else {
messageBody += "<li>✔️ Yesterday's individual event counts compared with previous days are within normal bounds</li>";
}
// GA4 Event Parameter Checks for ga4_event_param_check_sheet
messageBody += "</ol><br><b><u>GA4 Event Parameter Checks</u></b><br><br><ol>";
const paramCheckColumnA = ga4EventParamCheckSheet.getRange("A2:A").getValues();
const paramCheckColumnB = ga4EventParamCheckSheet.getRange("B2:B").getValues();
const paramCheckColumnC = ga4EventParamCheckSheet.getRange("C2:C").getValues();
const paramCheckColumnD = ga4EventParamCheckSheet.getRange("D2:D").getValues();
const paramCheckColumnE = ga4EventParamCheckSheet.getRange("E2:E").getValues();
const paramCheckColumnF = ga4EventParamCheckSheet.getRange("F2:F").getValues(); // Column F values
let newParams = {};
let disappearedParams = {};
paramCheckColumnE.forEach((row, index) => {
const status = row[0];
const paramName = paramCheckColumnB[index][0];
const eventName = paramCheckColumnA[index][0];
const countC = parseFloat(paramCheckColumnC[index][0]);
const countD = parseFloat(paramCheckColumnD[index][0]);
const statusF = paramCheckColumnF[index][0]; // Get the value from column F
// Only process if status is not 'New' or 'Disappeared' in column F
if (statusF !== 'New' && statusF !== 'Disappeared') {
if (status === 'New' && (countC > 100 || countD > 100)) {
if (!newParams[eventName]) newParams[eventName] = [];
newParams[eventName].push(paramName);
} else if (status === 'Disappeared' && (countC > 100 || countD > 100)) {
if (!disappearedParams[eventName]) disappearedParams[eventName] = [];
disappearedParams[eventName].push(paramName);
}
}
});
if (Object.keys(newParams).length > 0) {
messageBody += "<li>❌ New parameters detected:<ul>";
for (const [event, params] of Object.entries(newParams)) {
messageBody += `<li>${event}<ul>`;
params.forEach(param => messageBody += `<li>${param}</li>`);
messageBody += "</ul></li>";
}
messageBody += "</ul></li>";
} else if (Object.keys(disappearedParams).length > 0) {
messageBody += "<li>❌ Missing parameters:<ul>";
for (const [event, params] of Object.entries(disappearedParams)) {
messageBody += `<li>${event}<ul>`;
params.forEach(param => messageBody += `<li>${param}</li>`);
messageBody += "</ul></li>";
}
messageBody += "</ul></li>";
} else {
messageBody += "<li>✔️ No new or missing event parameters returned in yesterday's data above a significant threshold</li>";
}
// Check ga4_param_count_average for parameter count changes
const paramCountColumnA = ga4ParamCountAverageSheet.getRange("A2:A").getValues();
const paramCountColumnB = ga4ParamCountAverageSheet.getRange("B2:B").getValues();
const paramCountColumnC = ga4ParamCountAverageSheet.getRange("C2:C").getValues();
const paramCountColumnD = ga4ParamCountAverageSheet.getRange("D2:D").getValues();
const paramCountColumnE = ga4ParamCountAverageSheet.getRange("E2:E").getValues();
const paramCountColumnF = ga4ParamCountAverageSheet.getRange("F2:F").getValues(); // New column F check
let outlierParamEvents = {};
paramCountColumnE.forEach((row, index) => {
const percentageChange = parseFloat(row[0]);
const yesterdayCount = parseFloat(paramCountColumnC[index][0]);
const twoDaysAgoCount = parseFloat(paramCountColumnD[index][0]);
const eventName = paramCountColumnA[index][0];
const paramName = paramCountColumnB[index][0];
const statusF = paramCountColumnF[index][0]; // Get the value from column F
// Check if column F is not 'Yes' before processing
if (statusF !== 'Yes' && !isNaN(percentageChange) && (percentageChange > 0.20 || percentageChange < -0.20) && (yesterdayCount > 10000 || twoDaysAgoCount > 10000)) {
if (!outlierParamEvents[eventName]) outlierParamEvents[eventName] = [];
outlierParamEvents[eventName].push(paramName);
}
});
if (Object.keys(outlierParamEvents).length > 0) {
messageBody += "<li>❌ Change in event parameter count! There was a change of more than 20% in the counts for the below event parameters:<ol>";
for (const [event, params] of Object.entries(outlierParamEvents)) {
messageBody += `<li>${event}<ul>`;
params.forEach(param => messageBody += `<li>${param}</li>`);
messageBody += "</ul></li>";
}
messageBody += "</ol></li>";
} else {
messageBody += "<li>✔️ Yesterday's event parameter counts compared with previous days is within normal bounds.</li>";
}
messageBody += "</ol>"; // Close the ordered list
// Check ga4_type_param_averages for type parameter changes
messageBody += "<br><b><u>GA4 Type Parameter Checks</u></b><br><br>"; // Add <br> here for line break
const typeParamColumnB = ga4TypeParamAveragesSheet.getRange("B2:B").getValues();
const typeParamColumnC = ga4TypeParamAveragesSheet.getRange("C2:C").getValues();
const typeParamColumnD = ga4TypeParamAveragesSheet.getRange("D2:D").getValues();
const typeParamColumnE = ga4TypeParamAveragesSheet.getRange("E2:E").getValues();
let outlierTypes = [];
typeParamColumnE.forEach((row, index) => {
const percentageChange = parseFloat(row[0]);
const yesterdayCount = parseFloat(typeParamColumnC[index][0]);
const twoDaysAgoCount = parseFloat(typeParamColumnD[index][0]);
const typeName = typeParamColumnB[index][0];
if (!isNaN(percentageChange) && (percentageChange > 0.20 || percentageChange < -0.20) && (yesterdayCount > 1000 || twoDaysAgoCount > 1000)) {
outlierTypes.push(typeName);
}
});
if (typeParamColumnE.some(row => row[0] === 'New') || typeParamColumnE.some(row => row[0] === 'Disappeared')) {
// Check for new type parameters
const newTypes = typeParamColumnE
.map((row, index) => ({ typeName: typeParamColumnB[index][0], status: row[0] }))
.filter(row => row.status === 'New')
.map(row => row.typeName);
if (newTypes.length > 0) {
messageBody += `<li>❌ Oh wow! A new type parameter has appeared in yesterday's data. It's called:<ul>`;
newTypes.forEach(newType => {
messageBody += `<li>${newType}</li>`;
});
messageBody += "</ul></li>";
}
// Check for disappeared type parameters
const disappearedTypes = typeParamColumnE
.map((row, index) => ({ typeName: typeParamColumnB[index][0], status: row[0] }))
.filter(row => row.status === 'Disappeared')
.map(row => row.typeName);
if (disappearedTypes.length > 0) {
messageBody += `<li>❌ Oh no! A type parameter has disappeared from yesterday's data. It was:<ul>`;
disappearedTypes.forEach(disappearedType => {
messageBody += `<li>${disappearedType}</li>`;
});
messageBody += "</ul></li>";
}
}
if (outlierTypes.length > 0) {
messageBody += `<li>❌ Change in 'type' parameter count! The change in type parameter count between yesterday and previous days for certain types is greater than 20%. They are:<ul>`;
outlierTypes.forEach(type => {
const typeIndex = typeParamColumnB.findIndex(row => row[0] === type);
const percentageChange = typeParamColumnE[typeIndex][0].toLocaleString('en-US', { style: 'percent', minimumFractionDigits: 2 });
messageBody += `<li>${type} (${percentageChange})</li>`;
});
messageBody += "</ul></li>";
} else {
messageBody += "<li>✔️ Yesterday's type parameter counts compared with previous days are within normal bounds.</li>";
}
// Close the ordered list tags
messageBody += "</ol>";
// Send the email with the final message
MailApp.sendEmail({
to: emailRecipients,
subject: `GA4 Data Check Alert for ${formattedDateSubject}`,
htmlBody: messageBody
});
}
Verifying and investigating data alerts
The outputs from the alerts process are sent to the GOVUK GA4 Alerts Google group. If there are any flags in the data, then a member of this group can begin to investigate the issue(s). Not all flags raised by the alert process will warrant investigating, however. For example, if a rise in the count of ab_test event parameters is flagged, this might be expected behaviour if an AB test somewhere on GOV.UK has recently begun or concluded.
The output of this alert can be cross referenced with the GOV.UK GA4 data quality report which can be found here. This report visualises GA4 events, event type parameters and statistics by page locations over time. By using this report in combination with the alerts process, we should be able to verify and narrow down anomalies in the data, viewed in the context of longer time periods.
Get access to the GOV.UK GA4 Data Alerts Output
Those who would like to receive these alerts can request to join the GOVUK GA4 Alerts Google group. Please get in touch with the GOV.UK Insights and Analytics team if you would like adding to this list.
In the future we would like to make a web page or dashboard where we can present this information in some kind of ‘live status’ page, though this will probably come later once we have refined the data outputs in the email alert.