Implementing Spread in Python
In this notebook, you'll learn how to calculate and visualize three fundamental statistical measures: mean, variance, and standard deviation.
These concepts are key for summarizing and understanding data in fields like data science and analytics.
We'll use a practical example — daily sales numbers — to walk through how to compute these statistics and visualize the results clearly.
Step 1: Define the Dataset
Here, we assign an array to the variable data
to ensure we have a consistent dataset to work with for all calculations.
import numpy as np
# Create a numpy array of daily sales
data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16])
Step 2: Calculate Population Statistics
This function takes the array as input and returns the average value of all elements, which summarizes the central tendency of the dataset.
mean_val = np.mean(data) # Mean
variance_val = np.var(data) # Population variance (ddof=0 by default)
std_dev_val = np.std(data) # Population standard deviation
np.mean(data)
computes the arithmetic mean (average).np.var(data)
calculates the population variance (divides by $n$).np.std(data)
calculates the population standard deviation (square root of variance).
123456789101112import numpy as np # Create a numpy array of daily sales data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16]) mean_val = np.mean(data) # Mean variance_val = np.var(data) # Population variance (ddof=0 by default) std_dev_val = np.std(data) # Population standard deviation print(f"Mean: {mean_val}") print(f"Variance (Population): {variance_val}") print(f"Standard Deviation (Population): {std_dev_val}")
Step 3: Calculate Sample Statistics
To get unbiased estimates from a sample, we use ddof=1
.
This applies Bessel's correction, dividing variance by $(n-1)$ instead of $n$.
sample_variance_val = np.var(data, ddof=1)
sample_std_dev_val = np.std(data, ddof=1)
np.var(data, ddof=1)
→ sample variance;np.std(data, ddof=1)
→ sample standard deviation.
12345678910import numpy as np # Create a numpy array of daily sales data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16]) sample_variance_val = np.var(data, ddof=1) sample_std_dev_val = np.std(data, ddof=1) print(f"Variance (Sample): {sample_variance_val}") print(f"Standard Deviation (Sample): {sample_std_dev_val}")
Standard deviation is the square root of variance, giving a measure of spread in the same units as the original data, making it easier to interpret.
1. Which numpy
function calculates the average of data?
2. What does setting ddof=1
in numpy
's variance function do?
3. How do we calculate standard deviation with numpy
library?
Obrigado pelo seu feedback!
Pergunte à IA
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo
Can you explain the difference between population and sample statistics again?
Why do we use Bessel's correction (ddof=1) for sample statistics?
What do the calculated values tell us about the sales data?
Awesome!
Completion rate improved to 1.89
Implementing Spread in Python
Deslize para mostrar o menu
In this notebook, you'll learn how to calculate and visualize three fundamental statistical measures: mean, variance, and standard deviation.
These concepts are key for summarizing and understanding data in fields like data science and analytics.
We'll use a practical example — daily sales numbers — to walk through how to compute these statistics and visualize the results clearly.
Step 1: Define the Dataset
Here, we assign an array to the variable data
to ensure we have a consistent dataset to work with for all calculations.
import numpy as np
# Create a numpy array of daily sales
data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16])
Step 2: Calculate Population Statistics
This function takes the array as input and returns the average value of all elements, which summarizes the central tendency of the dataset.
mean_val = np.mean(data) # Mean
variance_val = np.var(data) # Population variance (ddof=0 by default)
std_dev_val = np.std(data) # Population standard deviation
np.mean(data)
computes the arithmetic mean (average).np.var(data)
calculates the population variance (divides by $n$).np.std(data)
calculates the population standard deviation (square root of variance).
123456789101112import numpy as np # Create a numpy array of daily sales data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16]) mean_val = np.mean(data) # Mean variance_val = np.var(data) # Population variance (ddof=0 by default) std_dev_val = np.std(data) # Population standard deviation print(f"Mean: {mean_val}") print(f"Variance (Population): {variance_val}") print(f"Standard Deviation (Population): {std_dev_val}")
Step 3: Calculate Sample Statistics
To get unbiased estimates from a sample, we use ddof=1
.
This applies Bessel's correction, dividing variance by $(n-1)$ instead of $n$.
sample_variance_val = np.var(data, ddof=1)
sample_std_dev_val = np.std(data, ddof=1)
np.var(data, ddof=1)
→ sample variance;np.std(data, ddof=1)
→ sample standard deviation.
12345678910import numpy as np # Create a numpy array of daily sales data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16]) sample_variance_val = np.var(data, ddof=1) sample_std_dev_val = np.std(data, ddof=1) print(f"Variance (Sample): {sample_variance_val}") print(f"Standard Deviation (Sample): {sample_std_dev_val}")
Standard deviation is the square root of variance, giving a measure of spread in the same units as the original data, making it easier to interpret.
1. Which numpy
function calculates the average of data?
2. What does setting ddof=1
in numpy
's variance function do?
3. How do we calculate standard deviation with numpy
library?
Obrigado pelo seu feedback!