Aprende What is P-value? | Testing of Statistical Hypotheses

Desliza para mostrar el menú

The P-value is a probability value used in statistical hypothesis testing. It is the probability of obtaining a test statistic at least as extreme as the one calculated from the sample data, assuming the null hypothesis is true. Thus, thanks to the p-value, we can determine whether the value of our criterion fell into the critical region

Hypothesis testing guideline

Step 1. We have samples and formulations of the main and alternative hypotheses. Firstly we define the significance level (probability of type 1 mistake) which will satisfy us;

Step 2. We choose the criterion by which we will test the hypothesis. Knowing the distribution of our initial data, we determine how the values of this criterion will be distributed;

Step 3. We consider the value of the criterion (it is also called test statistic) for our particular samples, after which we determine the p-value;

Note

If we cannot determine the real distribution of the criterion, then we can use the empirical. One of the methods for constructing the empirical distribution will be discussed in the penultimate chapter of this section.

Step 4. We reject the main hypothesis if the obtained p-value is less than the significance level. If the p-value is greater than the significance level - we conclude that the main hypothesis is right. We still reject the main hypothesis if the p-value differs very little from the given significance level.

Nevertheless, to test most of the hypotheses, the corresponding methods have already been implemented, so we do not need to complete all the steps but just get the p-value and compare it with a chosen significance level.

Example

Let's look at an example. In Section 3 Chapter 2, we estimated the parameters of the population based on the samples, making the assumption about the population's distribution. Let's now check if our data is normal / exponentially distributed with the found parameters.


              123456789101112131415161718192021222324
            
from scipy.stats import kstest, norm, expon
import pandas as pd
import numpy as np

gaussian_samples = np.array(pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/Advanced+Probability+course+media/gaussian_samples.csv', names=['Value']))
expon_samples = np.array(pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/Advanced+Probability+course+media/expon_samples.csv', names=['Value']))

# Specify significance level
alpha = 0.05

# Perform Kolmogorov-Smirnov test for normal distribution with estimated params. Main hypothesis is that distibutions are equal
# By default two-tailed hypothesis is tested: the alternative hypothesis is that distributions are not equal
test_statistic, p_value = kstest(gaussian_samples.flatten(), cdf=norm(loc=-0.042, scale=3.964).cdf)
if p_value > alpha:
    print('Data follows a normal distribution')
else:
    print('Data does not follow a normal distribution')

# Perform Kolmogorov-Smirnov test for exponential distribution with estimated param
test_statistic, p_value = kstest(expon_samples.flatten(), cdf=expon(scale=1/ 0.497).cdf)
if p_value > alpha:
    print('Data follows an exponential distribution')
else:
    print('Data does not follow an exponential distribution')

In the code above we:

Imported necessary datasets and specified significance level alpha;
Used Kolmogorov-Smirnov criterion to check the hypothesis about the distribution of our samples;
- used kstest function to get criterion value and p-value;
- used our data as the first argument of kstest function and the CDF of the normal/exponential distribution with specified parameters as the second argument.
Compared p_value with alpha to accept/reject the main hypothesis.

Note

There are many statistical tests to test the distribution of samples. The most popular are the Shapiro-Wilk test (scipy.stats.shapiro) , Anderson-Darling test (scipy.stats.anderson), Chi-squared goodness of fit test (scipy.stats.chisquare)

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 4. Capítulo 2

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Sección 4. Capítulo 2