Course Content
Learning Statistics with Python
Learning Statistics with Python
Paired t-test
The following function conducts a paired t-test:
This process resembles the one used for independent samples, but here we do not need to check the homogeneity of variance. The paired t-test explicitly does not assume that variances are equal.
Keep in mind that for a paired t-test, it's crucial that the sample sizes are equal.
With this information in mind, you can proceed to the task of conducting a paired t-test.
Here, you have data regarding the number of downloads for a particular app. Take a look at the samples: the mean values are nearly identical.
import pandas as pd import matplotlib.pyplot as plt # Read the data before = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/Testing2.0/before.csv').squeeze() after = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/Testing2.0/after.csv').squeeze() # Plot histograms plt.hist(before, alpha=0.7) plt.hist(after, alpha=0.7) # Plot the means plt.axvline(before.mean(), color='blue', linestyle='dashed') plt.axvline(after.mean(), color='gold', linestyle='dashed')
Swipe to show code editor
We establish the hypotheses:
- H₀: The mean number of downloads before and after the changes is the same;
- Hₐ: The mean number of downloads is greater after the modifications.
Conduct a paired t-test with this alternative hypothesis, using before
and after
as the samples.
Thanks for your feedback!
Paired t-test
The following function conducts a paired t-test:
This process resembles the one used for independent samples, but here we do not need to check the homogeneity of variance. The paired t-test explicitly does not assume that variances are equal.
Keep in mind that for a paired t-test, it's crucial that the sample sizes are equal.
With this information in mind, you can proceed to the task of conducting a paired t-test.
Here, you have data regarding the number of downloads for a particular app. Take a look at the samples: the mean values are nearly identical.
import pandas as pd import matplotlib.pyplot as plt # Read the data before = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/Testing2.0/before.csv').squeeze() after = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/Testing2.0/after.csv').squeeze() # Plot histograms plt.hist(before, alpha=0.7) plt.hist(after, alpha=0.7) # Plot the means plt.axvline(before.mean(), color='blue', linestyle='dashed') plt.axvline(after.mean(), color='gold', linestyle='dashed')
Swipe to show code editor
We establish the hypotheses:
- H₀: The mean number of downloads before and after the changes is the same;
- Hₐ: The mean number of downloads is greater after the modifications.
Conduct a paired t-test with this alternative hypothesis, using before
and after
as the samples.
Thanks for your feedback!