Skip to main content

Use GOV.UK Content Store database

The Content Store is a MongoDB database of almost all the content published on GOV.UK.

The Content Store does not have all GOV.UK content. The Content Store has the content itself, but does not include dynamic elements such as top links on taxon pages, navigation elements, or search result pages.

Contact a GOV.UK Data Products software developer through the GOV.UK Data Products slack channel for more information.

Consider using the GOV.UK mirror if you need a more representative data source of what users actually see on the website.

If you want to access a database that has a copy of every version of every page ever drafted or published on GOV.UK, use the GOV.UK publishing database.

Get access to the Content Store

To access the Content Store, you:

  • sign into AWS and assume the correct AWS role
  • download a copy of the production version of the Content Store to your local machine
  • query the local production version of the Content Store using Docker

Sign into AWS and select the correct AWS role

  1. Sign into AWS. See the GOV.UK Developer Docs on getting AWS access for more information.

  2. Select your name in the top right of the screen and select Switch roles.

  3. Under Account, you can select select govuk-infrastructure-integration or 210287912431.

  4. Under Role, select govuk-datascienceusers.

  5. You can enter any text into Display name or leave this field empty.

  6. You can select any colour in Colour. Best practice is to select green for integration, amber for staging and red for production.

  7. Select Switch Role.

Download a copy of the production version of the Content Store

  1. Go to the govuk-integration-database-backups AWS S3 bucket and then go to the mongo-api folder.

  2. Download a copy of the production version of the Content Store. The production version file is {DATETIME}-content_store_production.gz, where {DATETIME} is a date and time in YYYY-MM-DDTHH_MM_SS format.

  3. Run the following in the command line to extract the content_items.bson file from the content_store_production folder of the downloaded production version file:

    tar -xvf PATH/TO/DATETIME-content_store_production.gz content_store_production/content_items.bson
    

Query the local production version of the Content Store using Docker

  1. Set up a local Docker instance by following the instructions in the govuk-mongodb-content GitHub repo readme file.

  2. Access your local version of the Content Store through your local Docker instance by using the Mongo Express graphical interface.

Instead of using Docker, you can query the Content Store database using the:

Content Store best practice

You should use the Mongo query language to transform your data as much as possible before exporting to another programming language.

This makes sure that you use MongoDB’s power to do most of the resource-intensive data processing. Doing this is especially important with a NoSQL database like MongoDB, as MongoDB does not have a stable schema.

Therefore it can be difficult to unnest or replicate Mongo queries in another language, for example using base Python to unnest fields.

MongoDB drivers, such as the Python driver pymongo, can help you run Mongo queries from an alternative language.

The GOV.UK Developer Docs “schema” page describes possible fields for each document type. However, this is incomplete, and document type schemas are not enforced due to the nature of NoSQL databases. See the MongoDB NoSQL documentation for more information.

This page was last reviewed on 9 September 2021. It needs to be reviewed again on 9 March 2022 .
This page was set to be reviewed before 9 March 2022. This might mean the content is out of date.