Google Cloud Platform logs
This page is a work in progress.
The Google Cloud Platform (GCP) logs contain data generated by use of GCP resources.
Content
The logs datasets contain the Cloud Audit logs, minus any data from services we choose to exclude, exported into a BigQuery dataset.
These datasets are set up for each GCP project individually, and so the history accumulated varies by project.
The all_gcp_logs
logs - collating logs data across multiple GCP projects - was set up in November 2023.
Access
Access to this data is limited to GCP admins and analysts who require it for specific use cases. Contact the #data-engineering Slack channel if you would like access.
Location
The logs data for each project is stored in the project in a dataset called gcp_logs
.
Logs data for many of our projects is also routed into a dataset called all_gcp_logs
in the gds-bq-reporting
project, which is used to enable analysis and reporting across multiple projects’ logs.
Set-up
To configure the logs, a logs storage bucket is set up in the europe-west2 region with ‘Log Analytics’ enabled and a BigQuery dataset linked.
For the multi-project all_gcp_logs
dataset, sinks are set up within the GCP Log router to write the logs from one project into the multi_project
log bucket in the gds-bq-reporting
project.
Currently, logs with the IDs cloudaudit.googleapis.com/activity
, externalaudit.googleapis.com/activity
, cloudaudit.googleapis.com/system_event
, externalaudit.googleapis.com/system_event
, cloudaudit.googleapis.com/access_transparency
, and externalaudit.googleapis.com/access_transparency
are excluded from all logs datasets.
Example terraform configuration
resource "google_service_account" "log_writer" {
account_id = "log-writer"
display_name = "Log writer"
description = "For writing logs to a bucket in another project"
}
data "google_iam_policy" "service_account_log_writer" {
binding {
role = "roles/iam.serviceAccountTokenCreator"
members = [
"serviceAccount:service-${var.project_number}@gcp-sa-logging.iam.gserviceaccount.com",
]
}
}
resource "google_service_account_iam_policy" "log_writer" {
service_account_id = google_service_account.log_writer.name
policy_data = data.google_iam_policy.service_account_log_writer.policy_data
}
resource "google_logging_project_sink" "log_sink" {
name = "log-sink"
destination = "logging.googleapis.com/projects/gds-bq-reporting/locations/europe-west2/buckets/multi_project"
exclusions {
name = "standard-exclusions"
description = "Standard exclusions https://docs.data-community.publishing.service.gov.uk/data-sources/gcp-logs/#set-up"
filter = <<-EOT
logName=(
"projects/govuk-knowledge-graph/logs/cloudaudit.googleapis.com%2Factivity"
OR "projects/govuk-knowledge-graph/logs/cloudaudit.googleapis.com%2Fsystem_event"
OR "projects/govuk-knowledge-graph/logs/cloudaudit.googleapis.com%2Faccess_transparency"
OR "projects/govuk-knowledge-graph/logs/externalaudit.googleapis.com%2Factivity"
OR "projects/govuk-knowledge-graph/logs/externalaudit.googleapis.com%2Fsystem_event"
OR "projects/govuk-knowledge-graph/logs/externalaudit.googleapis.com%2Faccess_transparency"
)
EOT
}
unique_writer_identity = true
custom_writer_identity = google_service_account.log_writer.member
}
An IAM binding is also required in the source project. IAM can be configured in several different ways, so this example is fragmentary.
binding {
role = "roles/logging.bucketWriter"
members = [
google_service_account.log_writer.member,
]
}
The target project must give the log_writer
service account the role roles/logging.bucketWriter
at the project level.
Schema
All of these datasets have the default Cloud Audit logs schema.
Retention
This data is stored for 1 year.