Skip to main content

Use the spider diagram tool

The spider diagram tool is a visualisation tool that shows which pages users:

  • come from before visiting a page of interest
  • go to after visiting a page of interest

This data is over a date range and is broken down by:

  • device category
  • internal/external links
  • individual entry/exit pages paired with the count and proportion of page views

This tool was developed within Data Services for use by analysts within GDS, and can be found in the ‘Path tools’ folder in the ‘Performance and Data Analysts Community’ shared Google Drive.

To use the spider diagram tool, you must do the following:

  1. Make a copy of the spider diagram tool notebook.
  2. Open the tool notebook in Google Colab.
  3. Run the tool notebook.
  4. View the outputs and save a local copy of the data.

Run the tool notebook

To run the notebook, you must do the following.

  1. Authenticate your access.
  2. Set the input variables.
  3. Run the query cells.
  4. Run the Execute queries cell.
  5. Create and download the visualisation.

Authenticating your access

  1. Run the cells in order. When running cell 1 - auth.authenticate_user() - you will see a pop-up asking you to authenticate
  2. Follow the on screen prompts, selecting your account and Allow when prompted
  3. The cell will show as successfully run - with a small green tick - when you have successfully authenticated

If you receive a warning message saying “The notebook was not authored by Google”, select Run Anyway.

Set the input variables

You must complete and run the cells in the Input Variables section.

  1. Complete the Set Input Variables Form by entering the following fields:
  • start_date and end_date - the date range you want to look at, in YYYY/MM/DD format
  • page_path - the URL for the page of interest which always starts with /, for example /brexit
  • path_or_title - whether the visualisation will show the page paths or page titles, for example /find-a-job or Find a job
  • remove_query_string - check this box if you want to remove the query string from the URL, for example if you’ve input an answer into a smart answer field
  • device categories - you must select at least one device type from Desktop, Mobile or Tablet, otherwise the code will not run and will raise a ValueError

You do not need to change the following fields: - project_id - this is always gds-bq-reporting - ga_dataset - this is always analytics_330577055 - the GOV.UK GA4 Production dataset

  1. Select Runtime and then Run after to run all the cells in the Input Variables section.

Run the query cells

Run the following cells in the following order.

  1. Query – Previous Page Path.
  2. Query – Acquisition Source.
  3. Query – Next Page Path.

Run the Execute queries cell

The Execute queries cell estimates and shows you the amount of data read by the query in gigabytes.

If you’re happy to run the query, enter “yes” into the user input box.

If you leave the input box blank or type in something other than “yes”, the query will not run.

Create and download the visualisation

Once the Input Variables query and Execute queries cells have finished running, the notebook generates the interactive Plotly figure visualisation.

To download the plot as a .png, hover your cursor over the figure and select the camera icon located in the top right-hand menu.

Save a local copy of the data

Once the Input Variables query and Execute queries cells have finished running, the notebook downloads the following files to your local machine:

  • a CSV file of the entry and exit data, including the number and proportion of page views
  • an HTML file of the Plotly figure
  • a text file of the metadata for the executed SQL queries

If none or only some of the files are downloaded, check the end of the URL search bar. If you see a download icon with a red cross, select the icon and change the option to Always allow…, and then select Done.

Assumptions and caveats

This log contains a list of assumptions and caveats used in the Forward Path tool analysis.

Definitions

Assumptions are Red-Amber-Green (RAG)-rated according to the following definitions for quality and impact.

RAG rating Assumption quality Assumption impact
Green Reliable assumption, well understood and/or documented. Anything up to a validated and recent set of actual data. Marginal assumptions that their changes have no or limited impact on the outputs.
Amber Some evidence to support the assumption. May vary from a source with poor methodology to a good source that’s a few years old. Assumptions with a relevant, even if not critical, impact on the outputs.
Red Little evidence to support the assumption. May vary from an opinion to a limited data source with poor methodology Core assumptions of the analysis is that the output would be drastically affected by their change.

Thank you to the Home Office Analytical Quality Assurance team for these definitions.

Group all pages that have less than 2.5% of sessions to (other)

  • Quality: Green (for visualisation purposes)
  • Impact: Green (for visualisation purposes)

Pages that contain less than 2.5% of overall sessions are difficult to see in the diagram both in terms of their volume and any associated labels.

To mitigate this, all pages with less than 2.5% of overall sessions are aggregated into a separate category, (other). This happens immediately after the tool has executed the SQL queries, but before the tool has rendered the visualisations and outputs.

As this is strictly for visualisation purposes, it is acceptable. However, any downstream analysis of the outputs is obviously limited due to this aggregation. As such, outputs of this tool should not be used for further quantitative analysis without a thorough understanding of this assumption’s implications.

Notebook tool excludes URL query parameters and anchors from page paths

  • Quality: Green
  • Impact: Green

The notebook tool has removed all URL query parameters and anchors from page paths so that page views are associated with the general page path URL, rather than specific query parameters and anchors. The tool removes the URL parameters and anchors during SQL execution.

The tool removes the URL parameters and anchors as the overall aim of the tool is to provide an understanding of which page paths have been viewed, regardless of query parameters and anchors. This is in line with Google Analytics, which also excludes query parameters and anchors from the page path.

This page was last reviewed on 19 August 2024. It needs to be reviewed again on 19 February 2025 .
This page was set to be reviewed before 19 February 2025. This might mean the content is out of date.