Course Content
The Art of A/B Testing
The Art of A/B Testing
Histograms and Box Plots
About Histograms
To visually evaluate the distribution, you need to build histograms. If the distributions are far from normal, we should notice it right away.
Picture time! Let's build distributions for two groups on one graph.
# Import the libraries import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Plotting hists of Impression columns sns.histplot(df_control['Impression'], color='#1e2635', label='Group A') sns.histplot(df_test['Impression'], color='#ff8a00', label='Group B') # Add the legend to the graph plt.legend(title='Groups') plt.xlabel('Impression') plt.ylabel('Frequency') plt.title('Distribution of Impressions') # Show the graph plt.show()
In this code, we use the sns.histplot
function from the seaborn
library. We pass it to the desired column df_control['Impression']
to compare with df_test['Impression']
.
Are these distributions normal? Hard to tell...
Let's look at box plots:
About Boxplots
# Import libraries import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Add to the dataframes columns-labels, which mean belonging to either the control or the test group df_control['group'] = 'Contol group' df_test['group'] = 'Test group' # Concat control and test dataframes df_combined = pd.concat([df_control, df_test]) sns.boxplot(data=df_combined, x='group', y='Impression', palette=['#1e2635', '#ff8a00'], medianprops={'color': 'red'}) # Sign the axes plt.xlabel('') plt.ylabel('Impression') plt.title('Comparison of Impressions') # Show the results plt.show()
Even after boxplots, it is not clear whether the distributions are normal.
In order to display two boxplots on the same chart, we combine the data frames using the pd.concat
function.
Next, we use the sns.boxplot
function, passing the combined data frame df_combined
to it. On the x-axis are the values of the column 'Impression'
, and on the y-axis are the Сontrol and Test group. With the help of the matplotlib
library, we sign the plot and axes.
Even after boxplots, it is not clear whether the distributions are normal. But in normality, we need to be sure.
How to do it? Statistical tests come to the rescue, which we will discuss in the next chapter.
Thanks for your feedback!