Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Histograms and Box Plots | Normality Check
The Art of A/B Testing
course content

Course Content

The Art of A/B Testing

The Art of A/B Testing

1. What is A/B testing?
2. Normality Check
3. Variances in A/B Testing
4. T-Test
5. U-Test

bookHistograms and Box Plots

About Histograms

To visually evaluate the distribution, you need to build histograms. If the distributions are far from normal, we should notice it right away.

Picture time! Let's build distributions for two groups on one graph.

123456789101112131415161718192021
# Import the libraries import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Plotting hists of Impression columns sns.histplot(df_control['Impression'], color='#1e2635', label='Group A') sns.histplot(df_test['Impression'], color='#ff8a00', label='Group B') # Add the legend to the graph plt.legend(title='Groups') plt.xlabel('Impression') plt.ylabel('Frequency') plt.title('Distribution of Impressions') # Show the graph plt.show()
copy

In this code, we use the sns.histplot function from the seaborn library. We pass it to the desired column df_control['Impression'] to compare with df_test['Impression'].

Are these distributions normal? Hard to tell...

Let's look at box plots:

About Boxplots

12345678910111213141516171819202122232425
# Import libraries import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Add to the dataframes columns-labels, which mean belonging to either the control or the test group df_control['group'] = 'Contol group' df_test['group'] = 'Test group' # Concat control and test dataframes df_combined = pd.concat([df_control, df_test]) sns.boxplot(data=df_combined, x='group', y='Impression', palette=['#1e2635', '#ff8a00'], medianprops={'color': 'red'}) # Sign the axes plt.xlabel('') plt.ylabel('Impression') plt.title('Comparison of Impressions') # Show the results plt.show()
copy

Even after boxplots, it is not clear whether the distributions are normal.

In order to display two boxplots on the same chart, we combine the data frames using the pd.concat function.

Next, we use the sns.boxplot function, passing the combined data frame df_combined to it. On the x-axis are the values of the column 'Impression', and on the y-axis are the Сontrol and Test group. With the help of the matplotlib library, we sign the plot and axes.

Even after boxplots, it is not clear whether the distributions are normal. But in normality, we need to be sure.

How to do it? Statistical tests come to the rescue, which we will discuss in the next chapter.

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 4
some-alt