Skip to main content

GOV.UK Content Publishing Dates

Date sources and Publishing App differences

Content created using GOV.UK’s publishing apps report a number of date fields. These dates can refer to a number of milestones, including the date that a document was first created (the date that the first edition of that document was created or published), the date a piece of content was last edited, last publicly updated, or received a ‘major’ publishing update.

These dates are surfaced in a number of places including the knowledge graph, within GOV.UK’s GA4 data, and the publishing api, which is accessible to analysts via the the govuk-knowledge-graph.publishing_api data set. The dates are also shown in the interfaces of the publishing apps themselves, the main two being Whitehall and Mainstream (Publisher).

There are, however, differences in the way that these dates are determined between the different publishing apps which means that, when comparing the date values seen in any of the above sources, discrepancies can arise.

Determining ‘published dates’ for content analysis

For analysing content on GOV.UK, it is often useful to attach content meta data from various sources by which the data can be then grouped and sorted. One such piece of meta data would be publishing dates - e.g. when did a piece of content first appear on GOV.UK, and when was it last publicly updated?

Because of the issues described above, it is not always possible to determine the correct values from a single source, owing to the differences between the publishing apps, and how these sources have attempted to normalise the data into a single table.

One way of obtaining this information would instead be to combine data in the knowledge graph with the data from the publishing apps themselves. From cross referencing the dates reported within a publishing app’s interfaces, we have so far determined this to be a more accurate method of determining publishing dates for live GOV.UK content.

Code explanation

Below is an extract of code used by the GOV.UK content team’s analysts.

What it does in essence is retrieve core page data about pages on GOV.UK (regardless of whether they have received page views or not) from the search.page table. It then determines which publishing app these pages belong to, and (the important part here) how the first_published_at and page_last_update_at fields are determined.

first published date

Logic:

if the app is Publisher, use the create_at date in the publisher.actions table for first version of the page else use first_published_at from search.page

Specifically on the join used here: for Publisher pages, the first_published_at date is determined by getting the create_at field from the publisher.actions table, by joining the edition_id found in the editionsv1 table (which is a filtered version of the publisher.editions table which only returns the first version of an edition (version_number = 1)), with the edition_id in the publisher.actions table.

last updated date

Logic:

if the app is Publisher, use the create_at date in publisher.actions table for latest published version of the page else use public_updated_at from search.page

Specifically on the join used here: for Publisher, the page_last_update_at date is determined by getting the create_at field from the publisher.actions table, by joining the edition_id found in the editionsLatest table (which is a filtered version of the publisher.editions table which only returns the latest version of an edition (WHERE state = "published")), with the edition_id in the publisher.actions table.

See below for the full code and comments for more detailed explanations.

Code

WITH
-- ============================================================
-- Step 1: Base GOV.UK page data (KG extract)
-- ============================================================
-- PURPOSE: Extract core page data from the Knowledge Graph
-- SOURCE: `govuk-knowledge-graph.search.page`
-- DESCRIPTION: Retrieves all published pages excluding content-block-manager.
--              Creates slugs (by removing the www.gov.uk prefix), applies document type enrichment for finders,
--              and handles organisation name standardisation
-- ============================================================
kg_pages_extract AS (
SELECT
  p.url,                                                  
  -- Remove https://www.gov.uk prefix to create slug
  REGEXP_REPLACE(p.url, r'^https://www\.gov\.uk', '')
    AS slug,                                              

  -- 📋 COLUMN: publisher_slug (renamed from PublisherSlug)
  -- Special handling for Publisher app pages to extract correct slug format:
  -- Basically, if the app is 'Publisher' and the url contains /done/ or /help/, then return the full path after the gov.uk domain.
  -- For any other 'Publisher' page, return the first part of the path - this is to return just a single value for guides. So /enduring-power-attorney-duties/legal becomes /enduring-power-attorney-duties/
  -- Alongside this, retrieve a page's title, content_id and locale from the search.page table
  CASE
    WHEN p.publishing_app = "publisher" AND REGEXP_CONTAINS(p.url, r"https://www.gov.uk/(done|help)/")
    THEN REGEXP_REPLACE(p.url, r"https://www.gov.uk", "") 
    WHEN p.publishing_app = "publisher"
    THEN REGEXP_EXTRACT(p.url, r"https://www.gov.uk(/[^/]*)")
    ELSE REGEXP_REPLACE(p.url, r"https://www.gov.uk", "")
  END AS publisher_slug,
   p.title,                                                
  p.contentId,                                            
  p.locale,                                                

  -- Edits the 'document type' field, swapping out 'finder' for finder + publishing app source if the app can be identified. E.g. if the original document type value is 'finder' and the app is Whitehall, it becomes 'finder (Whitehall)'
  CASE
    WHEN p.documentType = 'finder' AND p.publishing_app = 'search-api' THEN 'finder (search-api)'
    WHEN p.documentType = 'finder' AND p.publishing_app = 'specialist-publisher' THEN 'finder (specialist-publisher)'
    WHEN p.documentType = 'finder' AND p.publishing_app = 'whitehall' THEN 'finder (whitehall)'
    ELSE p.documentType
  END AS documentType,
   p.publishing_app,                                      
  p.primary_organisation,                                 
   -- Convert organisations array to comma-separated string with name standardisation for MHCLG and OSSS
  CASE
    WHEN p.organisations IS NULL OR ARRAY_LENGTH(p.organisations) = 0 THEN CAST(NULL AS STRING)
    ELSE ARRAY_TO_STRING(
      ARRAY(
        SELECT
          -- Standardise organisation names that have changed
          CASE
            WHEN org = 'Department for Levelling Up, Housing and Communities' THEN 'Ministry of Housing, Communities and Local Government'
            WHEN org = 'Scotland Office' THEN 'Office of the Secretary of State for Scotland'
            ELSE org
          END
        FROM UNNEST(p.organisations) AS org
      ), ', ')
  END AS organisations,
   -- Extract first organisation from array with name standardisation as a field called first_tagged_org - not necessarily indictaor of importance,
   --e.g. https://www.gov.uk/check-national-insurance-record is DWP, HMRC and https://www.gov.uk/claim-national-insurance-refund is HMRC, DWP
   --The notes in Publisher suggest HMRC takes the lead on them both.
  CASE
    WHEN p.organisations[SAFE_OFFSET(0)] = 'Department for Levelling Up, Housing and Communities' THEN 'Ministry of Housing, Communities and Local Government'
    WHEN p.organisations[SAFE_OFFSET(0)] = 'Scotland Office' THEN 'Office of the Secretary of State for Scotland'
    ELSE p.organisations[SAFE_OFFSET(0)]
  END AS first_tagged_org,
   p.first_published_at AS kg_first_published_at,          
  p.public_updated_at AS kg_public_updated_at             
FROM `govuk-knowledge-graph.search.page` p
WHERE p.publishing_app != 'content-block-manager'         
),

-- ============================================================
-- Step 2: Corrected publishing dates - Publisher app pages
-- ============================================================
-- PURPOSE: Get accurate publishing dates for Publisher app content
-- SOURCES: `govuk-knowledge-graph.publisher.editions`
--          `govuk-knowledge-graph.publisher.actions`
-- DESCRIPTION: Publisher app dates in KG can be inaccurate, so we extract
--              correct dates from the editions and actions tables. We get
--              both the latest published version and the first version (v1)
--              to determine first_published_at and last_updated_at
-- ============================================================

-- Get latest published editions from the publisher.editions table
editionsLatest AS (
SELECT
  slug AS editionsLatest_slug,                            
  CONCAT("/", slug) AS editionsLatest_slashSlug,          
  id AS editionsLatest_id,                                
  state AS editionsLatest_state,                          
  version_number AS editionsLatest_version_number,        
  created_at AS editionsLatest_created_at,                
  updated_at AS editionsLatest_updated_at                 
FROM `govuk-knowledge-graph.publisher.editions`
WHERE state = "published"
),

-- Get publish actions for latest editions from the publisher.actions table
actionsLatest AS (
SELECT
  edition_id AS actionsLatest_edition_id,                 
  request_type AS actionsLatest_request_type,             
  create_at AS actionsLatest_create_at,                   
  updated_at AS actionsLatest_updated_at                  
FROM `govuk-knowledge-graph.publisher.actions`
WHERE REGEXP_CONTAINS(request_type, "^(publish|published)$")
),

-- Get first version (v1) editions for original publish date from the publisher.editions table
editionsv1 AS (
SELECT
  slug AS editionsv1_slug,                                
  CONCAT("/", slug) AS editionsv1_slashSlug,              
  id AS editionsv1_id,                                    
  state AS editionsv1_state,                              
  version_number AS editionsv1_version_number,            
  created_at AS editionsv1_created_at,                    
  updated_at AS editionsv1_updated_at                     
FROM `govuk-knowledge-graph.publisher.editions`
WHERE version_number = 1
),

-- Get publish actions for v1 editions from the publisher.actions table
actionsv1 AS (
SELECT
  edition_id AS actionsv1_edition_id,                     
  request_type AS actionsv1_request_type,                 
  create_at AS actionsv1_create_at,                       
  updated_at AS actionsv1_updated_at                      
FROM `govuk-knowledge-graph.publisher.actions`
WHERE REGEXP_CONTAINS(request_type, "^(publish|published)$")

),

-- ============================================================
-- Step 3: Pre-join data preparation with dates
-- ============================================================
-- PURPOSE: Prepare data before org filling logic
-- SOURCES: editionsLatest, actionsLatest, editionsv1, actionsv1, kg_pages_extract
-- JOINS:
--   - RIGHT JOIN kg_pages_extract with editionsLatest (for Publisher dates)
--   - LEFT JOIN actionsLatest (for latest publish action dates)
--   - LEFT JOIN editionsv1 (for original v1 edition)
--   - LEFT JOIN actionsv1 (for original publish action dates)
-- DESCRIPTION: Combines page data with corrected dates. This happens BEFORE
--              the org filling logic so we can then join with org metadata
--              after filling
-- ============================================================
--
pre_filled_data AS (
SELECT
  kg.url,
  kg.slug,
  kg.publisher_slug,
  kg.title,
  kg.contentId,
  kg.locale,
  kg.documentType,
  kg.publishing_app,
  -- Store original values before null-filling logic
  kg.primary_organisation AS original_primary_organisation,
  kg.organisations,
  kg.first_tagged_org AS original_first_tagged_org,








  -- Calculate corrected first_published_at date
  -- For Publisher app: use v1 action date, fallback to KG date.
  -- Extra context: For Publisher, the first_published_at date is determined by getting the 'create_at' field from the publisher.actions table, by joining the edition_id found in the editionsv1 table (which is a filtered version of the publisher.editions table which only returns the first version of an edition (version_number = 1)), with the edition_id in the publisher.actions table.
  -- For other apps: use KG date
  CASE
    WHEN kg.publishing_app = "publisher"
      THEN COALESCE(
        DATE(MIN(actionsv1.actionsv1_create_at)),
        MAX(DATE(kg.kg_first_published_at))
      )
    ELSE MAX(DATE(kg.kg_first_published_at))
  END AS first_published_at,


  -- Calculate corrected last update date
  -- For Publisher app: use latest action date, fallback to KG date
 -- Extra context: For Publisher, the page_last_update_at date is determined by getting the 'create_at' field from the publisher.actions table, by joining the edition_id found in the editionsLatest table (which is a filtered version of the publisher.editions table which only returns the latest version of an edition (WHERE state = "published")), with the edition_id in the publisher.actions table.
  -- For other apps: use KG date
  CASE
    WHEN kg.publishing_app = "publisher"
      THEN COALESCE(
        DATE(MAX(actionsLatest.actionsLatest_create_at)),
        MAX(DATE(kg.kg_public_updated_at))
      )
    ELSE MAX(DATE(kg.kg_public_updated_at))
  END AS page_last_update_at


FROM editionsLatest
-- RIGHT JOIN to keep all KG pages, even those not in Publisher
RIGHT JOIN kg_pages_extract kg
  ON editionsLatest.editionsLatest_slashSlug = kg.publisher_slug
  AND kg.publishing_app = "publisher"


-- LEFT JOIN to get latest publish action dates
LEFT JOIN actionsLatest
  ON editionsLatest.editionsLatest_id = actionsLatest.actionsLatest_edition_id
  AND kg.publishing_app = "publisher"


-- LEFT JOIN to get v1 editions
LEFT JOIN editionsv1
  ON editionsLatest.editionsLatest_slashSlug = editionsv1.editionsv1_slashSlug
  AND kg.publishing_app = "publisher"


-- LEFT JOIN to get v1 publish action dates
LEFT JOIN actionsv1
  ON editionsv1.editionsv1_id = actionsv1.actionsv1_edition_id
  AND kg.publishing_app = "publisher"


GROUP BY
  kg.url, kg.slug, kg.publisher_slug, kg.title, kg.contentId, kg.locale,
  kg.documentType, kg.publishing_app, kg.primary_organisation,
  kg.organisations, kg.first_tagged_org
)

SELECT * from pre_filled_data
This page was last reviewed on 12 December 2025. It needs to be reviewed again on 12 June 2026 .