A/A Test

Preparation for the Experiment

Before conducting a controlled experiment, we must be convinced that the data in the test group have been collected correctly. We must consider several factors:

Day of the week effect. Groups may differ on weekends and weekdays. People behave differently on different days of the week. Therefore, the data will be collected within a week;
Seasonality. During the holidays, users shop more actively, which can give false ideas about real sales. Therefore, the data are collected in the season without holidays;
Growing number of users over time. More and more people are involved in the experiment over time. So, we conducted an online experiment for three groups of users. Each group was tested for a full week. An equal number of users took part in each experiment. Our experiment took place in the off-season (there were no holidays that would have provoked an increase in sales).

The metric of the success of the experiment is the conversion rate. It is time to check the adequacy of our results.

Let's get acquainted with the data. Both datasets have one hundred records and three columns. The first column 'Male' is binary. If the value is equal to 1 - the user is male. If the value is equal to 0 - the user is female. The second column 'Page View' characterizes the number of page views. The third column 'Purchase' corresponds to the number of purchases. Let's see what these tables look like:


              12345678910
            
# Import libraries
import pandas as pd 
from scipy.stats import mannwhitneyu

# Read .csv files
control_group_1 = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/updated_first.csv')
control_group_2 = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/updated_second.csv')

# Show head of the first file
print(control_group_1)

The second file is similar to the first file. But can we say that there is no statistically significant difference between them?

We must make sure that no factors influenced our experiment. In other words, the average metric values of control group 1 and control group 2 must be the same.

Let's formulate hypotheses:

H₀: There is no statistically significant difference between the means of the two samples.

Hₐ: There is a statistically significant difference between the means of the two samples.

Our first test:


              123456789101112131415161718192021
            
# Import libraries
import pandas as pd 
from scipy.stats import mannwhitneyu

# Read .csv files
control_group_1 = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/updated_first.csv')
control_group_2 = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/updated_second.csv')

# Define metric
control_group_1['Conversion'] = (control_group_1['Purchase'] / control_group_1['Page view']).round(2)
control_group_2['Conversion'] = (control_group_2['Purchase'] / control_group_2['Page view']).round(2)

# Do U-Test
stat, p = mannwhitneyu(control_group_1['Conversion'], control_group_2['Conversion'])

# Identify the test result
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
    print('There is no statistically significant difference between the medians of the two samples')
else:
    print('There is a statistically significant difference between the medians of the two samples')

Since p > 0.05, we cannot reject the null hypothesis that the two means are equal.

Looks easy, right?

In this code, we created two groups of data, control_group_1 and control_group_2, performed a u-test using the mannwhitney function, and displayed the test results on the screen.

Why was this particular test chosen? We'll talk about this in the next chapters.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 4

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

The Art of A/B Testing