Questions tagged [ab-test]

A/B testing, also known as split or bucket testing, is a controlled comparison of the effectiveness of variants of a website, email, or other commercial product.

A/B test, or split or bucket test, is a colloquial term for a controlled experiment in which users are randomly exposed to one of several variants of a product, often a website feature.

The Response or Dependent Variable is most often count data (such as clicks on links or sales) but may be a continuous measure (like time on site). Count data is sometimes transformed to rates for analysis.

Because they create temporary variants of 'live' websites, on-line A/B tests must overcome several challenges not common in traditional experiments of human preference. For example, differential caching of test versions may degrade website performance for some versions. Users may be shown multiple variants if they return to a website and are not successfully identified with cookies or by login information. Moreover, nonhuman activity (search engine crawlers, email harvesters, and botnets) may be mistaken for human users.

Useful References:

Kohavi, Ron, Randal M. Henne, and Dan Sommerfield. "Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO." (2007).

Kohavi, Ron, et al. "Trustworthy online controlled experiments: five puzzling outcomes explained." Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2012.

51 questions
15
votes
2 answers

Analyzing A/B test results which are not normally distributed, using independent t-test

I have a set of results from an A/B test (one control group, one feature group) which do not fit a Normal Distribution. In fact the distribution resembles more closely the Landau Distribution. I believe the independent t-test requires that the…
teebszet
  • 253
  • 2
  • 6
8
votes
2 answers

A/B testing: How to calculate p-value on post test segments?

My question on A/B testing is about doing post test segmentation analysis. For example: I run an A/B test on my website to track bounce rate. On the treatment group, i put a video to explain my company. On the control group i put just plain…
jxn
  • 233
  • 2
  • 5
7
votes
1 answer

What does Prec@1 in fastText mean?

In Bag of Tricks for Efficient Text Classification paper which is popular right now, he calculates prec@1 for the datasets in the experimentation segment. What does that mean?
Hima Varsha
  • 2,366
  • 16
  • 34
5
votes
2 answers

AB testing : When AA testing doesn't work

After 6 months of AB testing on our CRM tool (Oracle Responsys, but this could be true with anyone), the test exhibited some weird results so we decided to pause everything, and to make some good old AA testing. AA testing consists in dividing…
WNG
  • 191
  • 5
5
votes
4 answers

What is the minimum size of the test set?

The mean of a population of binary values can be sampled with about 1000 samples at 95% confidence, and 3000 samples at 99% confidence. Assuming a binary classification problem, why is the 80/20% rule always used, and not the fact that with a few…
zzzbbx
  • 85
  • 1
  • 1
  • 7
5
votes
3 answers

What are the methods to ensure that the population split for A/B test is random?

Before launching an A/B test, what are the methods to ensure that the population split in control and target group is random for a particular label say, purchase rate.
vick
  • 91
  • 1
  • 4
4
votes
1 answer

A/B testing randomization step

Let's say I want to measure a medicine’s impact on height. So I randomly broke down my user base into two groups (control and experiment, obviously). I calculated the average height for both groups before experiment and found out there’s a 2 cm…
Helene
  • 143
  • 4
3
votes
3 answers

A/B Testing (Binomial Distribution vs Random Distribution)

When performing an A/B test for the number of clicks for users viewing (each view is an impression) two variants of an ad, a binomial distribution can be assumed where each variant has a constant click-through rate. Example: Two Ads, -> Ad one has…
DD.
  • 131
  • 2
3
votes
2 answers

How to control false positives in sequential A/B testing while keeping a low sample size?

I am working on a planned sequence of n independent A/B tests(=run a maximum of n tests or stop earlier if a good improvement is found) and in order to keep the significance level within an acceptable level(=0.05) I'm considering ways of controlling…
3
votes
2 answers

Is it scientifically correct to derive conclusions unrelated to hypothesis from A/B test data

Consider a software A/B test with the hypothesis that "the addition of feature F is predicted to increase metric X". At the end of the test, the data doesn't show any significant change in X, but it does show a significant increase in Y - something…
Jon Burgess
  • 133
  • 4
3
votes
1 answer

How would I chi-squared test these simple results from A/B experiment?

I have results from an A/B experiment where users could do one of three things: Watch, Interact, or Nothing My data is like this: Watch | Nothing | Interact A: 327445 | 271602 | 744702 B: 376455 | 140737 | 818204 I tried to use the…
OneChillDude
  • 133
  • 1
  • 5
2
votes
3 answers

Analysis of Split (A/B) tests using Poisson and/or Binomial Distribution

Cross posting this from Cross Validated: I've seen this question asked before, but I have yet to come across a definitive source answering the specific questions: What's the most appropriate statistical test to apply to a small A/B test? What's the…
brycemcd
  • 121
  • 1
  • 4
2
votes
1 answer

A/B test results contradictory with offline machine learning model performance

This seems to be a common problem when bringing machine learning models to production. Let's say we have an optimized machine learning model which gives decent performance metric in the unseen testing dataset. We are quite satisfied with that, and…
CathyQian
  • 251
  • 2
  • 6
2
votes
1 answer

Permutation test on two groups

I am trying to use a permutation test to test my hypothesis. I want to make sure I am understanding concept of permutation correctly. I have control and experimental group. Then I combine them and resample from combined dataset randomly calculating…
2
votes
0 answers

Firebase AB testing algorithm

We have run an AB test at firebase which has the following results: I was also building my own Bayesian AB-test suite and was wondering how they came to these conclusions. What I was doing was querying the data of this test for the Control Group…
Boris Mulder
  • 83
  • 11
1
2 3 4