Guidelines
A/B Testing
Collect reliable information on real users’ behaviour with relatively little resources.
Image by Storyset from Freepik
Test Phases
Time: 3-5 days
Study Design
Planning & Recruiting
- Defining the scope and purpose
- Deciding on location and equipment/ Testing approach
- Creating a testing guide with a task list and relevant question
- Reach out with an incentive plan
- Scheduling & Screening
Time: 1-3 days if moderated , several weeks if unmoderated
Conduct Study
- Prepare materials and set up test environment
- Motivating users to share their thoughts & take notes (if moderated)
Time: 1-2 days
Study Evaluation
- Clean up the data & enrich with notes
- Identify patterns of processes or problems across the testers
- Prioritize problems discovered and share report
Time: ongoing
Implementation
- Create a repository to track any changes on your prototype and the rationale for doing this
- Check on defined criteria whether the changes you made had the desired impact
Study Design Planning & Recruiting
3-5 days
- Defining the scope and purpose
- Deciding on location and equipment/ Testing approach
- Creating a testing guide with a task list and relevant questions
- Reach out with an incentive plan
- Scheduling & Screening
Conducting the Test
1-3 days if moderated Several weeks if unmoderated
- Prepare materials and set up test environment
- Motivating users to share their thoughts & take notes (if moderated)
Analyzing & Reporting
1-2 days
- Clean up the data & enrich with notes
- Identify patterns of processes or problems across the testers
- Prioritize problems discovered and share report
Implementation
...
- Create a repository to track any changes you make on your prototype and what was the rationale for doing this
Study Design
Read here about the different options for your study – your research question should always guide you in chosing the right format, approach, and setting!
Start with your research question
1. In your research question, you define what you are looking for in the A/B test, including what data you need to answer your question:
- Peoples’ thoughts and associations
- Findability on your website
- Usability for doing a specific tasks.
2. Depending on your research question, define exploratory (open-ended) or directed (answer-oriented, often with measurable success) tasks or metrics to measure during the A/B test
Examples
- Exploratory task: Use the App to find advice that you could apply on your field.
- Directed task: Change the language in which the information is shown to you in the App.
- Metric: % of users that click on the redesigned button (when comparing 2 designs)
How to implement your study
Ask yourself the following three questions to define the elements of your test:
Why Are You Testing?
Single vs. Multi Variant
What kind of insights are you seeking? Do you want to understand whether variant A or B performs better for the users? Or do you want to understand different elements of your product and their relationship to different headlines, button colors etc.?
Single Variant Testing
- Test A against B (both new):
e.g. two text variants or design variants - Test new version against old:
e.g. one new feature variant of your existing product or service
Why it’s useful
Clear feedback on user behavior
Potential challenges
Data easily impacted by outside factors (season etc.)
Where Are You Testing?
Do you need to sit with the users and observe them? Or do you prefer to get behavioral insights from a lot of different, remote testing users?
In Person Testing
- You meet your users physically for the testing – e.g. in your office/meeting room or where your users are
- Is always moderated testing
Why it’s useful
A moderator can observe and record the user’s body language, gestures, and non-verbal cues
good for testing with people that have low access to internet/ digital skills
Potential challenges
requires more time, logistics and budget (also for compensation payments)
timelines need to meet users' availability
How Are You Testing?
Do you need to sit with the users and observe them? Or do you prefer to get behavioral insights from a lot of different, remote testing users?
Moderated Testing
- A real person facilitates (moderates) the testing, either physically or virtually
- Can be done remotely or in person
Why it’s useful
The moderator can ask individual follow-up questions
The moderator can guide and support users during the testing (e.g. with a complex feature)
The moderator can observe body language and non-verbal cues
Potential challenges
Moderator might introduce bias
More time (logistics) and budget needed
Best Practices
Find here best practice examples with helpful tips and tricks.
Do’s and Don’ts
Do’s
- Create a strong hypothesis that you test and a goal to achieve with the test – How will you know that one variant worked better than the other?
- Define the threshold for statistical validity for the results to be meaningful for you (Here is a tool to help you calculate it)
- Make sure to control other influencing variables that might affect the validity of your results (e.g. seasonal variability)
- Be ready to accept that your test failed, and your new idea is not improving your problem. If the result is "no difference", you might use the version you prefer.
Don’ts
- Don't overuse this method, you can rely on expertise to identify which ideas are worth testing. Prior to A/B testing, do some observation and user interviews to identify crucial bottlenecks in your product and identify the best entry points for improvement. Then design alternatives and test them
- Don't run ABC tests – test different parts or hypothesis after one another (multivariant tests are A/B - A/C tests)
- Don't abort test before the necessary data is in because you think you see a tendency. Also, don't leave the test on forever to force a positive test result.
Potential Bias To Be Aware Of
Find a detailed overview of potential biases with counter actions here. Below a list of potential bias to be aware of when conducting A/B tests.
Image by Storyset from Freepik
The Recency Effect
People tend to give more weight to their most recent experiences. They form new opinions biased towards the latest news, e.g. by focusing only on the problems found in the latest usability session
Image by Storyset from Freepik
Image by Storyset from Freepik
Anchoring Bias
When people make decisions, they tend to rely too heavily on one piece of information a trait that already exists. A famous example is from Henry Ford: “If I had asked people what they wanted, they would have said faster horses.”
Image by Storyset from Freepik
Social Desirability / Friendliness Bias
People tend to make more “socially acceptable” decisions when they are around other people. Same holds true for interviews, people want to make you feel good and will answer what they think you find pleasant and acceptable.
Image by Storyset from Freepik
Image by Storyset from Freepik
The Hawthorne Effect
The very act of being observed can cause participants to change their behavior. The quality of observational data is heavily impacted by this.
Reading Recommendation
6 Essential Tips for Effective A/B Testing by Adobe XD
Define Stronger A/B Test Variations Through UX Research by NN Group
A/B Testing in UX Design: When and Why It’s Worth It by Adobe XD
Overview and ideas on what to test in A/B test by UX Design Institute