Skip to main content

Reference information

As a new starter in Data Services, the following reference information may be of use.

GDS wiki

The GDS Wiki contains GDS-specific information. Some useful pages include:

Cabinet Office Intranet (CabWeb)

You must sign into the GDS Virtual Private Network (VPN) before you can access CabWeb. See the guidance on signing into the GDS VPN using your Google credentials for more information.

Once you’re signed into the VPN, you can access CabWeb. Some useful pages include:

Single Operating Platform (SOP)

SOP is a Cabinet Office-wide platform for most human resource functions, including editing your personal information, accessing your payslip, logging expenses, and requesting special leave. For more information on how to use SOP, see the SOP guidance on CabWeb.

To access SOP:

GDS Business Operations Tool (GBOT)

HR requests are made using the GDS Business Operations Tool (GBOT). For more information on GBOT ask your line manager or your business operations team.

The GDS Way

The GDS Way is a public-facing website that documents the specific technology, tools, and processes used at GDS and CDDO.

Although software developer-focused, some useful pages include:

The GDS Data Audit

The Government Digital Service (GDS) is digital by default, producing and collecting various data. Under the new Government Cyber Security Strategy and the associated Cyber Assessment Framework (CAF), there is a requirement relating to effectively managing our data sources (also referred to as data assets). The Data Protection Act (UK’s implementation of the General Data Protection Regulation (GDPR)) also requires us to understand the data we hold and how it is used.

Some of these data sources are captured in this GDS data management audit conducted in 2022 by GDS Data Services. The premise was to create a Github repository of data sources with named owners, in the hope that a federated approach would create accountable owners to maintain it. Each data source has an associated markdown document which describes characteristics of the data organised, by GDS teams.

If you create a data source / data set during your time at GDS you are obliged to (under the CAF and Data Protection Act) document it in this repository:

  1. Fill in a copy of the data management template for your dataset

  2. Place the resulting document under the organisation folder in this repository, under a new folder for the team that is responsible for the data. If a folder does not exist then create one.

This data is itself useful for your colleagues across GDS and also for the Information Assurance team.

GOV.UK Confluence

Ask your delivery manager for access to the GOV.UK Confluence workspace.

Once you have access to GOV.UK Confluence, go to the Analytics on GOV.UK Confluence page. This page has definitions on the custom dimensions used in Google BigQuery analytics tables.

The Aqua book for maintaining analytical quality assurance (AQA)

Our analytical work can have far-reaching implications, including impacting individuals and their livelihoods. The Aqua book provides high-level guidance on producing quality analysis for government. This is termed analytical quality assurance (AQA). This book sets out how departments should ensure their work is fit-for-purpose through verification and validation.

These checks apply to anything that can be loosely defined as a “model”. If your work takes an input, processes it, and produces an output, this comes under the scope of AQA. This includes but is not limited to visualisations, spreadsheets, machine learning models, and even back-of-napkin-type calculations.

The Aqua book establishes four principles:

Proportionality of response

The extent of the analytical quality assurance effort should be proportionate in response to the risks associated with the intended use of the analysis. These risks include financial, legal, operational and reputational impacts. In addition, analysis that is frequently used to support a decision-making process may require a more comprehensive analytical quality assurance response.

Assurance throughout development

Quality assurance considerations should be taken into account throughout the life cycle of the analysis and not only at the end. Effective communication is crucial when understanding the problem, designing the analytical approach, conducting the analysis and relaying the outputs.

Verification and validation

Analytical quality assurance is more than checking that the analysis is error-free and satisfies its specification (verification). It must also include checks that the analysis is appropriate, that is, fit for the purpose for which it is being used (validation).

Analysis with RIGOUR

Quality analysis needs to be the following:

  • repeatable ®
  • independent (I)
  • grounded in reality (G)
  • objective (O)
  • have understood and managed uncertainty (U)
  • the results should address the initial question robustly ®

In particular, it is important to accept that uncertainty is inherent within the inputs and outputs of any piece of analysis. It is important to establish how much we can rely upon the analysis for a given problem.

These principles must be considered when undertaking any work involving data/models.

Note that AQA is not just about software quality assurance. It can also include dealing with ethical considerations, reasons for choosing the method/technique, and validating analytical assumptions and caveats.

Further information

Additional Aqua book resources are available, and the Government Analytical Function, Government Data Quality Hub, and other departments have also produced:

GOV.UK developer documentation

See the documentation on document types on GOV.UK for information on the various document types present on GOV.UK. This list may be incomplete.

The documentation on the analytics (GA4) implementation on GOV.UK may also be of interest, providing information on how we collect GA4 data.

The GOV.UK GA4 implementation record documents the dataLayer pushes implemented on GOV.UK, which provide the majority of the information sent to GA4.

View the JSON of a GOV.UK page

You can view the JSON on a GOV.UK page by either:

Using either of these methods lets you view the A and B versions of a page.

Code examples

The following content has code examples for different data sources.

Google BigQuery code examples

Google BigQuery code examples are available in the govuk-data-labs-onboarding GitHub repo.

Content Store code examples

Downloading the Content Store may take some time.

If you need to use the Content Store in a project, you can instead use the:

GOV.UK mirror code examples

Downloading the GOV.UK mirror may take some time.

If you need to use the GOV.UK mirror in a project, you can instead use the page term TF-IDF matrix notebooks in the govuk-intent-detector GitHub repo.

Learning and development resources

Free? Materials Notes
No O’Reilly ebooks through ACM membership O’Reilly Media publishes technology-oriented books with an associated app for reading their books on the go. Useful books and videos include:
No Standard individual licence for Pluralsight Pluralsight provides online courses that lean towards software development and engineering. Some useful courses include:
Yes Advanced NLP with spaCy Free online course by the creators of spaCy on natural language processing, including exercises, slides, videos, multiple choice questions, and interactive, browser-based coding practice.
Depends Coursera Coursera hosts a number of courses on data science. You can “audit” courses for free; but you cannot complete certain assignments or obtain a completion certificate. It’s generally not worth paying for the courses. Good courses include:
Yes fast.ai Online courses on deep learning using fast.ai, practical data ethics, computational linear algebra, and natural language processing
Yes Interpretable Machine Learning Accessible book on interpretable machine learning, including interpretable machine learning models, as well as model-agnostic methods for interpretability.
Yes The Illustrated Word2vecJay Alammar’s GitHub Pages An illustrated guide to word2vec. The author, Jay Alammar, also has a whole host of other illustrated guides.
Yes Causal Inference for The Brave and True A light-hearted yet rigorous approach to learning impact estimation and sensitivity analysis.
Yes Datasheets for Datasets A paper proposing how to document datasets.
Yes Managing Python Environments Short blog post by Pluralsight summarising Python.
Yes Hypermodern Python A recent review on Python best practice for projects.
Yes Mathematics for Machine Learning Mathematical skills book to be able to interpret other advanced machine learning books.
Yes huggingface/datasets The largest hub of ready-to-use natural language processing datasets for machine learning models with fast, easy-to-use and efficient data manipulation tools.
Yes ONS Best Practice and Impact - Quality Assurance of Code for Analysis and Research Cross Governmental guidance on best practice for analysis and research.
Yes ethen8181/machine-learning Machine learning tutorials
Yes ageron/handson-ml2 Complementary code for the Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow O’Reilly book.
Yes awesomedata/awesome-public-datasets A topic-centric list of high quality open datasets.
Yes Made with ML Machine learning operations and engineering courses.
Yes ikatsov/tensor-house A collection of reference machine learning and optimization models for enterprise operations, including marketing, pricing, and supply chain.
Yes jghoman/awesome-apache-airflow Resources for Apache Airflow.
Yes aws/amazon-sagemaker-examples AWS Sagemaker examples - these are automatically loaded into Sagemaker instances.
Yes Chris-Engelhardt/data_sci_guide A community-curated list of data science courses, including direct, free replacement courses for paid options.
No Introduction to Statistical Learning: With Applications in R An accessible primer into machine learning - recommended read for newcomers to data science, and as a refresher.
Yes datastacktv/data-engineer-roadmap Roadmap for those wishing to study data engineering.
Yes alastairrushworth/free-data-science Resources and learning materials across a broad range of popular data science topics and arranged thematically.
This page was last reviewed on 9 August 2024. It needs to be reviewed again on 9 February 2025 .
This page was set to be reviewed before 9 February 2025. This might mean the content is out of date.