Use GOV.UK Content Store database
The Content Store is a MongoDB database of almost all the content published on GOV.UK.
The Content Store does not have all GOV.UK content. The Content Store has the content itself, but does not include dynamic elements such as top links on taxon pages, navigation elements, or search result pages.
Contact a GOV.UK Data Products software developer through the GOV.UK Data Products slack channel for more information.
Consider using the GOV.UK mirror if you need a more representative data source of what users actually see on the website.
If you want to access a database that has a copy of every version of every page ever drafted or published on GOV.UK, use the GOV.UK publishing database.
Get access to the Content Store
To access the Content Store, you:
- sign into AWS and assume the correct AWS role
- download a copy of the production version of the Content Store to your local machine
- query the local production version of the Content Store using Docker
Sign into AWS and select the correct AWS role
Sign into AWS. See the GOV.UK Developer Docs on getting AWS access for more information.
Select your name in the top right of the screen and select Switch roles.
Under Account, you can select select govuk-infrastructure-integration or 210287912431.
Under Role, select govuk-datascienceusers.
You can enter any text into Display name or leave this field empty.
You can select any colour in Colour. Best practice is to select green for integration, amber for staging and red for production.
Select Switch Role.
Download a copy of the production version of the Content Store
Go to the
govuk-integration-database-backups
AWS S3 bucket and then go to themongo-api
folder.Download a copy of the production version of the Content Store. The production version file is
{DATETIME}-content_store_production.gz
, where{DATETIME}
is a date and time inYYYY-MM-DDTHH_MM_SS
format.Run the following in the command line to extract the
content_items.bson
file from thecontent_store_production
folder of the downloaded production version file:tar -xvf PATH/TO/DATETIME-content_store_production.gz content_store_production/content_items.bson
Query the local production version of the Content Store using Docker
Set up a local Docker instance by following the instructions in the govuk-mongodb-content GitHub repo readme file.
Access your local version of the Content Store through your local Docker instance by using the Mongo Express graphical interface.
Instead of using Docker, you can query the Content Store database using the:
Content Store best practice
You should use the Mongo query language to transform your data as much as possible before exporting to another programming language.
This makes sure that you use MongoDB’s power to do most of the resource-intensive data processing. Doing this is especially important with a NoSQL database like MongoDB, as MongoDB does not have a stable schema.
Therefore it can be difficult to unnest or replicate Mongo queries in another language, for example using base Python to unnest fields.
MongoDB drivers, such as the Python driver pymongo
, can help you run Mongo queries from an alternative language.
The GOV.UK Developer Docs “schema” page describes possible fields for each document type. However, this is incomplete, and document type schemas are not enforced due to the nature of NoSQL databases. See the MongoDB NoSQL documentation for more information.