Guidelines

A/B Testing

Collect reliable information on real users’ behaviour with relatively little resources.

Image by Storyset from Freepik

Test Phases

Time: 3-5 days

Study Design
Planning & Recruiting

  • Defining the scope and purpose
  • Deciding on location and equipment/ Testing approach
  • Creating a testing guide with a task list and relevant question
  • Reach out with an incentive plan
  • Scheduling & Screening

Time: 1-3 days if moderated , several weeks if unmoderated

Conduct Study

  • Prepare materials and set up test environment
  • Motivating users to share their thoughts & take notes (if moderated)

Time: 1-2 days

Study Evaluation

  • Clean up the data & enrich with notes
  • Identify patterns of processes or problems across the testers
  • Prioritize problems discovered and share report

Time: ongoing

Implementation

  • Create a repository to track any changes on your prototype and the rationale for doing this
  • Check on defined criteria whether the changes you made had the desired impact

Study Design Planning & Recruiting

3-5 days

  • Defining the scope and purpose
  • Deciding on location and equipment/ Testing approach
  • Creating a testing guide with a task list and relevant questions
  • Reach out with an incentive plan
  • Scheduling & Screening

Conducting the Test

1-3 days if moderated Several weeks if unmoderated

  • Prepare materials and set up test environment
  • Motivating users to share their thoughts & take notes (if moderated)

Analyzing & Reporting

1-2 days

  • Clean up the data & enrich with notes
  • Identify patterns of processes or problems across the testers
  • Prioritize problems discovered and share report

Implementation

...

  • Create a repository to track any changes you make on your prototype and what was the rationale for doing this

Study Design

Read here about the different options for your study – your research question should always guide you in chosing the right format, approach, and setting!

Start with your research question

1. In your research question, you define what you are looking for in the A/B test, including what data you need to answer your question:

  • Peoples’ thoughts and associations
  • Findability on your website​
  • Usability for doing a specific tasks. ​

2. Depending on your research question, define exploratory (open-ended) or directed (answer-oriented, often with measurable success) tasks or metrics to measure during the A/B test

Examples

  • Exploratory task: Use the App to find advice that you could apply on your field.​
  • Directed task: Change the language in which the information is shown to you in the App.​
  • Metric: % of users that click on the redesigned button (when comparing 2 designs)

How to implement your study

Ask yourself the following three questions to define the elements of your test:

Why Are You Testing?

Single vs. Multi Variant

What kind of insights are you seeking? Do you want to understand whether variant A or B performs better for the users? Or do you want to understand different elements of your product and their relationship to different headlines, button colors etc.?​

Single Variant Testing

  • Test A against B (both new):​
    e.g. two text variants or design variants
  • Test new version against old​:
    e.g. one new feature variant of your existing product or service​

Why it’s useful

Clear feedback on user behavior

 

Potential challenges

Data easily impacted by outside factors (season etc.)

Where Are You Testing?

In Person vs. Remote

Do you need to sit with the users and observe them? Or do you prefer to get behavioral insights from a lot of different, remote testing users?​

In Person Testing

  • You meet your users physically for the testing – e.g. in your office/meeting room or where your users are​
  • Is always moderated testing​

Why it’s useful

A moderator can observe and record the user’s body language, gestures, and non-verbal cues

good for testing with people that have low access to internet/ digital skills

Potential challenges

requires more time, logistics and budget (also for compensation payments)

timelines need to meet users' availability

How Are You Testing?

Moderated vs. Unmoderated

Do you need to sit with the users and observe them? Or do you prefer to get behavioral insights from a lot of different, remote testing users?​

Moderated Testing

  • A real person facilitates (moderates) the testing, either physically or virtually​
  • Can be done remotely or in person​

Why it’s useful

The moderator can ask individual follow-up questions

The moderator can guide and support users during the testing (e.g. with a complex feature)

The moderator can observe body language and non-verbal cues

Potential challenges

Moderator might introduce bias

More time (logistics) and budget needed

Best Practices

Find here best practice examples with helpful tips and tricks.

Do’s and Don’ts

Do’s

  • Create a strong hypothesis that you test and a goal to achieve with the test – How will you know that one variant worked better than the other?
  • Define the threshold for statistical validity for the results to be meaningful for you (Here is a tool to help you calculate it)
  • Make sure to control other influencing variables that might affect the validity of your results (e.g. seasonal variability)
  • Be ready to accept that your test failed, and your new idea is not improving your problem. If the result is "no difference", you might use the version you prefer.

Don’ts

  • Don't overuse this method, you can rely on expertise to identify which ideas are worth testing. Prior to A/B testing, do some observation and user interviews to identify crucial bottlenecks in your product and identify the best entry points for improvement. Then design alternatives and test them
  • Don't run ABC tests – test different parts or hypothesis after one another (multivariant tests are A/B - A/C tests)
  • Don't abort test before the necessary data is in because you think you see a tendency. Also, don't leave the test on forever to force a positive test result.

Potential Bias To Be Aware Of

Find a detailed overview of potential biases with counter actions here. Below a list of potential bias to be aware of when conducting A/B tests.

The Recency Effect

People tend to give more weight to their most recent experiences. They form new opinions biased towards the latest news, e.g. by focusing only on the problems found in the latest usability session ​

Image by Storyset from Freepik

Anchoring Bias​

When people make decisions, they tend to rely too heavily on one piece of information a trait that already exists. A famous example is from Henry Ford: “If I had asked people what they wanted, they would have said faster horses.” ​

Image by Storyset from Freepik

Social Desirability / Friendliness Bias

People tend to make more “socially acceptable” decisions when they are around other people. Same holds true for interviews, people want to make you feel good and will answer what they think you find pleasant and acceptable. ​

Image by Storyset from Freepik

Image by Storyset from Freepik

The Hawthorne Effect​

The very act of being observed can cause participants to change their behavior​. The quality of observational data is heavily impacted by this.​