Skip to main content

GovGraph (GOV.UK Knowledge Graph)

This page is a work in progress.

The GOV.UK Knowledge Graph, or GovGraph, tables hold GOV.UK content data in BigQuery for analytical workloads.

Access

Access to the BigQuery dataset is limited to GDS staff.

For access, contact the #data-engineering community.

Location

The data is located in BigQuery in the govuk-knowledge-graph project.

Set-up

The GovGraph Google Cloud Projects also include infrastructure for the infrastructure for the GovSearch app, which uses data from GovGraph.

Read the documentation in the GitHub repository.

Schema

Tables

The schemas of some of the more heavily used Knowledge Graph tables are detailed below.

search.page

The govuk-knowledge-graph.search.page table includes everything that is in GovSearch.

In this table, different sections or chapters of a content item are kept as separate ‘pages’, instead of being grouped as content items under the contentId. ‘Gone’ and ‘redirect’ pages are excluded from this table. More detailed documentation can be found on GitHub.

field name mode type description notes
url NULLABLE STRING URL of a page Includes hostname and protocol, e.g. https://www.gov.uk/government/publications/low-pay-commission-research-2024
documentType REQUIRED STRING The kind of thing that a page is about
contentId REQUIRED STRING The ID of the content item of a page Multiple URLs can have the same contentId
locale REQUIRED STRING The ISO 639-1 two-letter code of the language of an edition on GOV.UK
publishing_app REQUIRED STRING The application that published a content item on GOV.UK
first_published_at NULLABLE TIMESTAMP The date that a page was first published. Automatically determined by the publishing-api, unless overridden by the publishing application.
public_updated_at NULLABLE TIMESTAMP When a page was last significantly changed (a major update). Shown to users. Automatically determined by the publishing-api, unless overridden by the publishing application.
publisher_updated_at NULLABLE TIMESTAMP When a page was last changed in the Publisher app. More meaningful than ‘updated_at’ in the Publishing API and Content API, which is polluted by editions that are created for techy reasons rather than editing reasons, and editors of mainstream pages tend not to use ‘public_updated_at’.
withdrawn_at NULLABLE TIMESTAMP The date the page was withdrawn.
withdrawn_explanation NULLABLE STRING The explanation for withdrawing a page
page_views NULLABLE INTEGER Number of page views from GA4 over 7 recent days
title NULLABLE STRING The title of a page
description NULLABLE STRING Description of a page
text NULLABLE STRING The content of the page as plain text extracted from the HTML Null for certain document types, such as contact pages, due to the way the content is generated
taxons REPEATED STRING Array of titles of taxons that the page is tagged to, and their ancestors
primary_organisation NULLABLE STRING Title of the primary organisation that published the page
organisations REPEATED STRING Array of titles of organisations that published the page
people REPEATED STRING Array of names of people who are associated with the page
organisations_ancestry REPEATED STRING Array of titles of organisations (and any parent organisations) that published the page
hyperlinks REPEATED RECORD Array of hyperlinks from the body of the page
phone_numbers REPEATED STRING Array of phone numbers from the body and metadata of the page
is_political NULLABLE BOOLEAN Indicator of whether the page is political. Pages where this is true, and that were published by a previous government, are displayed in ‘history mode’ with a prominent message drawing attention to the fact.
government NULLABLE STRING Title of the government that published the page, if the page is political.
hyperlinks.link_url STRING Link URL
hyperlinks.link_type STRING Type of link

public.publishing_api_editions_current

The govuk-knowledge-graph.public.publishing_api_editions_current table includes one record per ‘document’ as it currently appears on the GOV.UK website and in the Content API. A ‘document’ is here defined as a content item in a locale - each content item has one or more “documents” but at most one document per locale (unique key: content_id, locale).

In this table, different sections or chapters of a content item are grouped together as content items under the content_id and common base_path. More detailed documentation can be found on GitHub.

field name mode type description notes
content_id NULLABLE STRING
locale NULLABLE STRING
id NULLABLE INTEGER
title NULLABLE STRING
public_updated_at NULLABLE TIMESTAMP
publishing_app NULLABLE STRING
rendering_app NULLABLE STRING
update_type NULLABLE STRING
phase NULLABLE STRING
analytics_identifier NULLABLE STRING
updated_at NULLABLE TIMESTAMP
document_type NULLABLE STRING
schema_name NULLABLE STRING
first_published_at NULLABLE TIMESTAMP
base_path NULLABLE STRING The page path (not including hostname, protocol, or any query strings). Note that different chapters of a guide will have the same base path in this table, although they appear under different URLs on the site
document_id NULLABLE INTEGER
description NULLABLE STRING
published_at NULLABLE TIMESTAMP
details NULLABLE JSON
routes NULLABLE JSON
redirects NULLABLE JSON
unpublishing_type NULLABLE STRING

This page was last reviewed on 28 May 2025. It needs to be reviewed again on 28 November 2025 .