Course Content
Advanced Probability Theory
Advanced Probability Theory
Challenge: Using CLT to Compare Mean Values of Non-Gaussian Datasets
In the last chapter, we considered how to compare the mathematical expectations of two Gaussian datasets. But what if the datasets are not Gaussian, and is it possible to somehow compare them in this case?
Using Central Limit Theorem to compare mean values
We can use the CLT to compare mean values of non-Gaussian datasets:
- If we have many samples, we can use the CLT to construct new features: instead of analyzing samples, we can analyze the mean values of the samples. Due to CLT, if we calculate the mean with many samples, this mean value will be normally distributed;
- Use the Student criterion described in the previous chapter to test the hypothesis.
Note
For different distributions, you need to select a different number of samples for which the average is calculated to achieve normality. This is usually done experimentally using various tests for normality, for example,
shapiro
normality test.
Task
Now we will check the hypothesis that two exponential datasets have equal mean values using the Central Limit Theorem. Your task is:
- Import
ttest_ind
function fromscipy.stats
module to provide t-test. - Use
.mean()
method to calculate the mean over the sliding window insliding_mean
function. - Use
shapiro()
function to check normality ofX_mean
array. - Specify condition in
if
statement to check hypothesis.
Thanks for your feedback!
Challenge: Using CLT to Compare Mean Values of Non-Gaussian Datasets
In the last chapter, we considered how to compare the mathematical expectations of two Gaussian datasets. But what if the datasets are not Gaussian, and is it possible to somehow compare them in this case?
Using Central Limit Theorem to compare mean values
We can use the CLT to compare mean values of non-Gaussian datasets:
- If we have many samples, we can use the CLT to construct new features: instead of analyzing samples, we can analyze the mean values of the samples. Due to CLT, if we calculate the mean with many samples, this mean value will be normally distributed;
- Use the Student criterion described in the previous chapter to test the hypothesis.
Note
For different distributions, you need to select a different number of samples for which the average is calculated to achieve normality. This is usually done experimentally using various tests for normality, for example,
shapiro
normality test.
Task
Now we will check the hypothesis that two exponential datasets have equal mean values using the Central Limit Theorem. Your task is:
- Import
ttest_ind
function fromscipy.stats
module to provide t-test. - Use
.mean()
method to calculate the mean over the sliding window insliding_mean
function. - Use
shapiro()
function to check normality ofX_mean
array. - Specify condition in
if
statement to check hypothesis.
Thanks for your feedback!
Challenge: Using CLT to Compare Mean Values of Non-Gaussian Datasets
In the last chapter, we considered how to compare the mathematical expectations of two Gaussian datasets. But what if the datasets are not Gaussian, and is it possible to somehow compare them in this case?
Using Central Limit Theorem to compare mean values
We can use the CLT to compare mean values of non-Gaussian datasets:
- If we have many samples, we can use the CLT to construct new features: instead of analyzing samples, we can analyze the mean values of the samples. Due to CLT, if we calculate the mean with many samples, this mean value will be normally distributed;
- Use the Student criterion described in the previous chapter to test the hypothesis.
Note
For different distributions, you need to select a different number of samples for which the average is calculated to achieve normality. This is usually done experimentally using various tests for normality, for example,
shapiro
normality test.
Task
Now we will check the hypothesis that two exponential datasets have equal mean values using the Central Limit Theorem. Your task is:
- Import
ttest_ind
function fromscipy.stats
module to provide t-test. - Use
.mean()
method to calculate the mean over the sliding window insliding_mean
function. - Use
shapiro()
function to check normality ofX_mean
array. - Specify condition in
if
statement to check hypothesis.
Thanks for your feedback!
In the last chapter, we considered how to compare the mathematical expectations of two Gaussian datasets. But what if the datasets are not Gaussian, and is it possible to somehow compare them in this case?
Using Central Limit Theorem to compare mean values
We can use the CLT to compare mean values of non-Gaussian datasets:
- If we have many samples, we can use the CLT to construct new features: instead of analyzing samples, we can analyze the mean values of the samples. Due to CLT, if we calculate the mean with many samples, this mean value will be normally distributed;
- Use the Student criterion described in the previous chapter to test the hypothesis.
Note
For different distributions, you need to select a different number of samples for which the average is calculated to achieve normality. This is usually done experimentally using various tests for normality, for example,
shapiro
normality test.
Task
Now we will check the hypothesis that two exponential datasets have equal mean values using the Central Limit Theorem. Your task is:
- Import
ttest_ind
function fromscipy.stats
module to provide t-test. - Use
.mean()
method to calculate the mean over the sliding window insliding_mean
function. - Use
shapiro()
function to check normality ofX_mean
array. - Specify condition in
if
statement to check hypothesis.