Skip to main content

GOV.UK GA4 access logs

The GOV.UK GA4 access logs data details usage of the GOV.UK Google Analytics GA4 data via the Google Analytics Data API. This includes usage of the GOV.UK GA4 user interface and Looker Studio connections, as well as direct querying of the API, but does not include usage of the data exported to BigQuery.

Content

Data was first collected into this dataset on the 18th July 2023.

The fields in this dataset and their descriptions can be seen in the schema table below.

Access

Access to the BigQuery dataset is limited to GA4 user admins. However, the GA4 usage report which visualises this data is shared with all GDS performance analysts and the Data Services Google group.

Summarised data is also provided to SPOCs for their department.

Location

The data is located in BigQuery under the ga4-analytics-352613.ga4_logs dataset, in the GA4 Analytics project.

This is a partitioned table, and is partitioned on the epoch_time_micros timestamp.

Set-up

Data collection

This data is generated by querying the Google Analytics Admin API. A Google Cloud Run function is triggered by a Cloud Scheduler job to run every day at 6am GMT, retrieving the data and appending it to the ga4_logs table in the dataset mentioned above.

The Cloud Run function code can be seen in the ga4-access-report repository on Github.

If the table fails to populate for any reason, it can be backfilled using the following code:

!pip install -q google-analytics-admin

!gcloud auth application-default login --project=ga4-analytics-352613 --scopes=https://www.googleapis.com/auth/analytics.readonly,https://www.googleapis.com/auth/bigquery,https://www.googleapis.com/auth/cloud-platform

from google.analytics.admin import AnalyticsAdminServiceClient
import pandas as pd
from datetime import datetime
from google.auth import default
import os
import re

SCOPES = [
    'https://www.googleapis.com/auth/analytics.readonly',
    'https://www.googleapis.com/auth/bigquery']
PROJECT = 'ga4-analytics-352613'
creds, _ = default(
    scopes=SCOPES,
    default_scopes=SCOPES,
    quota_project_id=PROJECT)
GA4_ENTITY = 'properties/330577055'


def get_access_report(n):
    # client = AnalyticsAdminServiceClient(credentials=creds)
    client = AnalyticsAdminServiceClient()
    access_dict = {
      "entity": GA4_ENTITY,
      "limit": 100000,
      "date_ranges": [
        {
          "start_date": f"{n}",
          "end_date": f"{n}"
        }
      ],
      "dimensions": [
        {
          "dimension_name": "epochTimeMicros"
        },
        {
          "dimension_name": "userEmail"
        },
        {
          "dimension_name": "accessMechanism"
        },
        {
          "dimension_name": "accessorAppName"
        },
        {
          "dimension_name": "dataApiQuotaCategory"
        },
        {
          "dimension_name": "reportType"
        }
      ],
      "metrics": [
        {
          "metric_name": "accessCount"
        },
        {
          "metric_name": "dataApiQuotaPropertyTokensConsumed"
        }
      ]
    }

    access_records = client.run_access_report(access_dict)
    return access_records


def format_access_report(response):
    access_list = []

    for rowIdx, row in enumerate(response.rows):
        dims = {}

        for i, dimension_value in enumerate(row.dimension_values):
            dimension_name = response.dimension_headers[i].dimension_name
            if dimension_name.endswith("Micros"):
                # Convert microseconds since Unix Epoch to datetime object.
                dimension_value_formatted = datetime.utcfromtimestamp(
                    int(dimension_value.value) / 1000000
                )
            else:
                dimension_value_formatted = dimension_value.value
            dims[dimension_name] = dimension_value_formatted

        for i, metric_value in enumerate(row.metric_values):
            metric_name = response.metric_headers[i].metric_name
            dims[metric_name] = metric_value.value
        access_list.append(dims)

    df = pd.DataFrame(access_list)
    df = df.rename(columns={
      'epochTimeMicros': 'epoch_time_micros',
      'userEmail': 'user_email',
      'accessMechanism': 'access_mechanism',
      'accessorAppName': 'accessor_app_name',
      'dataApiQuotaCategory': 'api_quota_category',
      'reportType': 'report_type',
      'accessCount': 'access_count',
      'dataApiQuotaPropertyTokensConsumed': 'api_tokens_consumed'})

    df['access_count'] = pd.to_numeric(df['access_count'], errors='coerce')
    df['api_tokens_consumed'] = pd.to_numeric(df['api_tokens_consumed'], errors='coerce')
    df['domain'] = df['user_email'].apply(lambda x: ''.join(re.findall(r'(@.*$)', str(x))))

    return df


def send_to_bq(df):

    df.to_gbq(
        'ga4_logs.ga4_logs',
        project_id=PROJECT,
        chunksize=None,
        reauth=False,
        if_exists='append',
        auth_local_webserver=True,
        table_schema=None,
        location=None,
        credentials=creds
        )

def run(n='YYYY-MM-DD'):
    access_records = get_access_report(n)
    df = format_access_report(access_records)
    df.dtypes
    try:
        send_to_bq(df)
        return "all good"

    except Exception as e:
      print(df.shape)
      print(df.head(n=5))
      print(e)
      return "all bad"

#Insert the date to be backfilled here
run(n='2025-01-01')

Schema

field name type mode description
epoch_time_micros TIMESTAMP NULLABLE The unix microseconds since the epoch that the GA user accessed GA reporting data
user_email STRING NULLABLE The user’s email address
access_mechanism STRING NULLABLE The mechanism through which a user accessed GA reporting data, for example ‘Google Analytics User Interface’ or ‘Google Analytics API’
accessor_app_name STRING NULLABLE The name of the application that accessed Google Analytics reporting data, for example ‘Looker Studio’ or ‘Power BI’
api_quota_category STRING NULLABLE The quota category for the Data API request, for example ‘Core’ or ‘Realtime’
report_type STRING NULLABLE The type of reporting data that the GA user accessed, for example ‘Realtime’ or ‘Free form exploration’
access_count INTEGER NULLABLE The number of times GA reporting data was accessed. Note that every report viewed can result in one or more data access events
api_tokens_consumed INTEGER NULLABLE The number of property quota tokens consumed for Data API requests
domain STRING NULLABLE The email domain, taken from the user’s email address

Retention

The data retention is currently set to 2 years.

This page was last reviewed on 3 February 2025. It needs to be reviewed again on 3 August 2025 .
This page was set to be reviewed before 3 August 2025. This might mean the content is out of date.