Spider diagram tool

The spider diagram tool is a visualisation tool that shows which pages users:

come from before visiting a page of interest
go to after visiting a page of interest

This data is over a date range and is broken down by:

device category
internal/external links
individual entry/exit pages paired with the count and proportion of page views

This tool was developed within Data Services for use by analysts within GDS.

Using the spider diagram tool

Get access to the spider diagram tool

The spider diagram tool can be found in the ‘Path tools’ folder in the ‘Performance and Data Analysts Community’ shared Google Drive.

Running the spider diagram tool

To use the spider diagram tool, you must do the following:

Make a copy of the spider diagram tool notebook.
Open the tool notebook in Google Colab.
Run the tool notebook.
View the outputs and save a local copy of the data.

To run the notebook, you must do the following:

Authenticate your access.
Set the input variables.
Run the query cells.
Run the Execute queries cell.
Create and download the visualisation.

To authenticate your access, you should:

Run the cells in order. When running cell 1 - auth.authenticate_user() - you will see a pop-up asking you to authenticate
Follow the on screen prompts, selecting your account and Allow when prompted
The cell will show as successfully run - with a small green tick - when you have successfully authenticated

If you receive a warning message saying “The notebook was not authored by Google”, select Run Anyway.

Set the input variables

You must complete and run the cells in the Input Variables section.

Complete the Set Input Variables Form by entering the following fields:

start_date and end_date - the date range you want to look at, in YYYY/MM/DD format
page_path - the URL for the page of interest which always starts with /, for example /brexit
path_or_title - whether the visualisation will show the page paths or page titles, for example /find-a-job or Find a job
remove_query_string - check this box if you want to remove the query string from the URL, for example if you’ve input an answer into a smart answer field
device categories - you must select at least one device type from Desktop, Mobile or Tablet, otherwise the code will not run and will raise a ValueError

You do not need to change the following fields:

project_id - this is always gds-bq-reporting
ga_dataset - this is always analytics_330577055 - the GOV.UK GA4 Production dataset

Select Runtime and then Run after to run all the cells in the Input Variables section.

Run the query cells

Run the following cells in the following order.

Query – Previous Page Path.
Query – Acquisition Source.
Query – Next Page Path.

Run the Execute queries cell

The Execute queries cell estimates and shows you the amount of data read by the query in gigabytes.

If you’re happy to run the query, enter “yes” into the user input box.

If you leave the input box blank or type in something other than “yes”, the query will not run.

Create and download the visualisation

Once the Input Variables query and Execute queries cells have finished running, the notebook generates the interactive Plotly figure visualisation.

To download the plot as a .png, hover your cursor over the figure and select the camera icon located in the top right-hand menu.

Save a local copy of the data

Once the Input Variables query and Execute queries cells have finished running, the notebook downloads the following files to your local machine:

a CSV file of the entry and exit data, including the number and proportion of page views
an HTML file of the Plotly figure
a text file of the metadata for the executed SQL queries

If none or only some of the files are downloaded, check the end of the URL search bar. If you see a download icon with a red cross, select the icon and change the option to Always allow…, and then select Done.

How the spider diagram tool works

Data sources

The spider diagram tool uses the GOV.UK GA4 data as stored in BigQuery.

The full query is detailed in the tool notebook, and the original SQL can be found in the Original SQL query cell at the bottom of the notebook.

Assumptions and caveats

This log contains a list of assumptions and caveats used in the Forward Path tool analysis.

Assumptions are Red-Amber-Green (RAG)-rated according to the following definitions for quality and impact.

RAG rating	Assumption quality	Assumption impact
Green	Reliable assumption, well understood and/or documented. Anything up to a validated and recent set of actual data.	Marginal assumptions that their changes have no or limited impact on the outputs.
Amber	Some evidence to support the assumption. May vary from a source with poor methodology to a good source that’s a few years old.	Assumptions with a relevant, even if not critical, impact on the outputs.
Red	Little evidence to support the assumption. May vary from an opinion to a limited data source with poor methodology	Core assumptions of the analysis is that the output would be drastically affected by their change.

Thank you to the Home Office Analytical Quality Assurance team for these definitions.

Group all pages that have less than 2.5% of sessions to `(other)`

Quality: Green (for visualisation purposes)
Impact: Green (for visualisation purposes)

Pages that contain less than 2.5% of overall sessions are difficult to see in the diagram both in terms of their volume and any associated labels.

To mitigate this, all pages with less than 2.5% of overall sessions are aggregated into a separate category, (other). This happens immediately after the tool has executed the SQL queries, but before the tool has rendered the visualisations and outputs.

As this is strictly for visualisation purposes, it is acceptable. However, any downstream analysis of the outputs is obviously limited due to this aggregation. As such, outputs of this tool should not be used for further quantitative analysis without a thorough understanding of this assumption’s implications.

Notebook tool excludes URL query parameters and anchors from page paths

Quality: Green
Impact: Green

The notebook tool has removed all URL query parameters and anchors from page paths so that page views are associated with the general page path URL, rather than specific query parameters and anchors. The tool removes the URL parameters and anchors during SQL execution.

The tool removes the URL parameters and anchors as the overall aim of the tool is to provide an understanding of which page paths have been viewed, regardless of query parameters and anchors. This is in line with Google Analytics, which also excludes query parameters and anchors from the page path.

This page was last reviewed on 25 November 2024. It needs to be reviewed again on 25 May 2025 .