Reference information

As a new starter in the Government Digital Service, the following reference information may be of use.

GDS wiki

The GDS Wiki contains GDS-specific information. Some useful pages include:

GDS induction
guidance on performance management
learning and development, including mandatory training, and using the learning and development budget

Cabinet Office Intranet (CabWeb)

You must sign into the GDS Virtual Private Network (VPN) before you can access CabWeb. See the guidance on signing into the GDS VPN using your Google credentials for more information.

Once you’re signed into the VPN, you can access CabWeb. Some useful pages include:

the Cabinet Office Analysis hub
human resources (HR) guidance on the HR hub
SOP guidance on MyHub

Single Operating Platform (SOP)

SOP is a Cabinet Office-wide platform for most human resource functions, including editing your personal information, accessing your payslip, logging expenses, and requesting special leave. For more information on how to use SOP, see the SOP guidance on CabWeb.

To access SOP:

sign into the GDS Virtual Private Network (VPN) using your Google account
sign into SOP

GDS Business Operations Tool (GBOT)

HR requests are made using the GDS Business Operations Tool (GBOT). For more information on GBOT ask your line manager or your business operations team.

The GDS Way

The GDS Way is a public-facing website that documents the specific technology, tools, and processes used at GDS and CDDO.

Although software developer-focused, some useful pages include:

style guides for different programming languages, including the GDS Python style guide
tracking, and managing third-party software dependencies
building accessible services, and understanding WCAG 2.1, which is a legal responsibility of public sector websites
best practice on using version control

The GDS Data Audit

The Government Digital Service (GDS) is digital by default, producing and collecting various data. Under the new Government Cyber Security Strategy and the associated Cyber Assessment Framework (CAF), there is a requirement relating to effectively managing our data sources (also referred to as data assets). The Data Protection Act (UK’s implementation of the General Data Protection Regulation (GDPR)) also requires us to understand the data we hold and how it is used.

Some of these data sources are captured in this GDS data management audit conducted in 2022 by GDS Data Services. The premise was to create a Github repository of data sources with named owners, in the hope that a federated approach would create accountable owners to maintain it. Each data source has an associated markdown document which describes characteristics of the data organised, by GDS teams.

If you create a data source / data set during your time at GDS you are obliged to (under the CAF and Data Protection Act) document it in this repository:

Fill in a copy of the data management template for your dataset
Place the resulting document under the organisation folder in this repository, under a new folder for the team that is responsible for the data. If a folder does not exist then create one.

This data is itself useful for your colleagues across GDS and also for the Information Assurance team.

GOV.UK Confluence

Ask your delivery manager for access to the GOV.UK Confluence workspace.

Once you have access to GOV.UK Confluence, go to the Analytics on GOV.UK Confluence page. This page has definitions on the custom dimensions used in Google BigQuery analytics tables.

GOV.UK developer documentation

See the documentation on document types on GOV.UK for information on the various document types present on GOV.UK. This list may be incomplete.

The documentation on the analytics (GA4) implementation on GOV.UK may also be of interest, providing information on how we collect GA4 data.

The GOV.UK GA4 implementation record documents the dataLayer pushes implemented on GOV.UK, which provide the majority of the information sent to GA4.

View the JSON of a GOV.UK page

You can view the JSON on a GOV.UK page by either:

using the GOV.UK Toolkit for Chrome and Firefox Chrome extension
adding /api/content into a page URL, for example, you can change https://www.gov.uk/browse/benefits/disability to https://www.gov.uk/api/content/browse/benefits/disability

Using either of these methods lets you view the A and B versions of a page.

Code examples

The following content has code examples for different data sources.

Google BigQuery code examples

Google BigQuery code examples are available in the govuk-data-labs-onboarding GitHub repo.

Content Store code examples

Downloading the Content Store may take some time.

If you need to use the Content Store in a project, you can instead use the:

JavaScript files in the govuk-intent-detector GitHub repo
PyMongo Jupyter notebook in the define-content-schemas branch of the govuk-intent-detector GitHub repo

GOV.UK mirror code examples

Downloading the GOV.UK mirror may take some time.

If you need to use the GOV.UK mirror in a project, you can instead use the page term TF-IDF matrix notebooks in the govuk-intent-detector GitHub repo.

Learning and development resources

Free?	Materials	Notes
No	O’Reilly ebooks through ACM membership	O’Reilly Media publishes technology-oriented books with an associated app for reading their books on the go. Useful books and videos include: Statistics in a Nutshell Data Science from Scratch Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow Agile Data Science 2.0 Deep Learning for Coders with fastai and PyTorch Bayesian Statistics the Fun Way A/B Testing, A Data Science Perspective
No	Standard individual licence for Pluralsight	Pluralsight provides online courses that lean towards software development and engineering. Some useful courses include: Unit Testing with Python From Data to Insights with Google Cloud
Yes	Advanced NLP with spaCy	Free online course by the creators of spaCy on natural language processing, including exercises, slides, videos, multiple choice questions, and interactive, browser-based coding practice.
Depends	Coursera	Coursera hosts a number of courses on data science. You can “audit” courses for free; but you cannot complete certain assignments or obtain a completion certificate. It’s generally not worth paying for the courses. Good courses include: Stanford - Machine Learning (Andrew Ng) John Hopkins University - Data Science Specialization
Yes	fast.ai	Online courses on deep learning using fast.ai, practical data ethics, computational linear algebra, and natural language processing
Yes	Interpretable Machine Learning	Accessible book on interpretable machine learning, including interpretable machine learning models, as well as model-agnostic methods for interpretability.
Yes	The Illustrated Word2vec Jay Alammar’s GitHub Pages	An illustrated guide to word2vec. The author, Jay Alammar, also has a whole host of other illustrated guides.
Yes	Causal Inference for The Brave and True	A light-hearted yet rigorous approach to learning impact estimation and sensitivity analysis.
Yes	Datasheets for Datasets	A paper proposing how to document datasets.
Yes	Managing Python Environments	Short blog post by Pluralsight summarising Python.
Yes	Hypermodern Python	A recent review on Python best practice for projects.
Yes	Mathematics for Machine Learning	Mathematical skills book to be able to interpret other advanced machine learning books.
Yes	huggingface/datasets	The largest hub of ready-to-use natural language processing datasets for machine learning models with fast, easy-to-use and efficient data manipulation tools.
Yes	ONS Best Practice and Impact - Quality Assurance of Code for Analysis and Research	Cross Governmental guidance on best practice for analysis and research.
Yes	ethen8181/machine-learning	Machine learning tutorials
Yes	ageron/handson-ml2	Complementary code for the Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow O’Reilly book.
Yes	awesomedata/awesome-public-datasets	A topic-centric list of high quality open datasets.
Yes	Made with ML	Machine learning operations and engineering courses.
Yes	ikatsov/tensor-house	A collection of reference machine learning and optimization models for enterprise operations, including marketing, pricing, and supply chain.
Yes	jghoman/awesome-apache-airflow	Resources for Apache Airflow.
Yes	aws/amazon-sagemaker-examples	AWS Sagemaker examples - these are automatically loaded into Sagemaker instances.
Yes	Chris-Engelhardt/data_sci_guide	A community-curated list of data science courses, including direct, free replacement courses for paid options.
No	Introduction to Statistical Learning: With Applications in R	An accessible primer into machine learning - recommended read for newcomers to data science, and as a refresher.
Yes	datastacktv/data-engineer-roadmap	Roadmap for those wishing to study data engineering.
Yes	alastairrushworth/free-data-science	Resources and learning materials across a broad range of popular data science topics and arranged thematically.

This page was last reviewed on 28 November 2024. It needs to be reviewed again on 28 May 2025 .