A/B and multivariate testing

You can use A/B or multivariate (MV) testing to test hypotheses and optimise webpages.

In A/B testing, you create 2 or more complete different versions of a webpage, and split web traffic between those versions. You use A/B testing to test complete different website ideas.

In MV testing, you identify the areas of a webpage that you want to test, and then create variations of the webpage to test against each other. You use MV testing to optimise and refine an existing webpage without having to do a significant redesign.

In both methods, you present users with a variant at random, and you use statistical analysis to determine which variant performs better for a defined conversion goal.

How to do A/B and MV testing

There are 6 steps to doing A/B and MV testing:

research to prepare for the test
form a clear and unambiguous hypothesis
design the test
build and quality assure the test
run the test
analyse the results

Research to prepare for the test

You should collect data to help you perform a hypothesis, so gather all possible and relevant data.

Both qualitative and quantitative research can help you prepare for your test by identifying areas that need improvement.

Once you’ve identified an area that needs improvement then you can go on to the next step.

Form a clear and unambiguous hypothesis

A hypothesis is a prediction you form before running an experiment.

The hypothesis states what is being changed, what you think the outcome will be and why.

Form the hypothesis in the format “If…, then…, because…”, where you would fill the blanks with “If [thing that you’re changing], then [result of change], because [rationale].”

For example, “If I change how I signpost page A on page X, then I will see more users clicking through from page X to page A because they’ve been able to find the correct content they need which, according to an analysis of user journeys in GA and user feedback, users have been struggling with”.

The hypothesis shows that you’ve really considered what you’re testing and why, and should be reliably documented somewhere. Some A/B testing tools allow you to store hypotheses in their tools.

Design the test

Now you have created a hypothesis, you design a test to prove or disprove that hypothesis.

For the previous example, you could sketch adding a banner to page X or changing the font colour of the link in question.

At this stage, if part of a product sprint, this is where you’d run a crazy eights workshop.

You must also decide who you want to test. For example, if you’re testing a banner for a new mobile app, you could decide to only do this test on mobile devices.

Finally, you must decide what the primary metric is for the test. The primary metric allows you to prove or disprove your hypothesis. In the example hypothesis, the primary metric could be clicks from page X to page A.

Build and quality assure the test

Build the test in your testing tool, including any metrics you need.

Decide how long you want the test to run. A good starting point is to have the test run for at least a full week to account for seasonality in times of day and day of week.

Alternatively, you could calculate the desired sample size needed to achieve significant results if you have a minimum detectable effect. However, you do not always have an expected minimal detectable effect.

If the test is complex, you may need to ask for a developer’s help.

Get the test quality assured (QA) by at least one other person.

QA the test across different criteria like multiple devices and browsers, or sign-in state if appropriate.

Check that the metrics are recording correctly.

If you are doing the QA, try to act like real users do, instead of like a power user.

Run the test

Push the test live to real users. Make sure that you:

choose an appropriate time to push your test live
split your sample groups equally and randomly, which is possible with most testing tools

You can start your test off with a small percentage of total eligible users first. If you do this, make sure that the proportions in the variants are equal. For example, you may start a test with 1% of traffic in A, 1% in B, and 98% not in test.

While the test is live, you can restrict access to live conversion metrics to stop people sharing results widely before the results are significant.

Check the test is working correctly and recording your metrics.

You should confirm who has the authority to stop the test and make sure that any on-call developers can switch off tests in case of bugs identified outside working hours.

A/B testing on GOV.UK

There is GOV.UK-specific documentation in the GOV.UK developer docs on how A/B testing works and on how to run an A/B or multivariate test from the technical perspective.

When developers implement A/B tests, the "govuk:ab-test" meta tag will be populated. The content of this meta tag is picked up and sent to Google Analytics 4 (GA4) in the ab_test custom parameter, and will appear in GA4 in the ‘AB test’ custom dimension. Analysts can use this dimension to obtain and compare metrics for each version of the page being tested.

Note that the register of A/B tests must be updated whenever an A/B test is run on GOV.UK.

Analyse the results

Once your test is complete, analyse the results.

You have already decided on a primary metric to complete the statistical significance calculations on.

You can also analyse any other metrics you chose to record or any changes in user feedback.

This step does not necessarily need to be completed by the team or person who suggested the test idea. If you do not have the necessary skills, ask for help from performance analysts or data scientists.

Also, if you have set up the test correctly in your testing tool, those tools often have their own statistical significance calculations.

Regardless of whether you have a clear winning variant, or your results are inconclusive, if you designed your test properly, it will provide useful insights and might help in iterations you can make in the future.

You can also repeat the process to keep testing and learning.

This page was last reviewed on 14 August 2024. It needs to be reviewed again on 14 February 2025 .