Implementing Spread in Python
In this notebook, you'll learn how to calculate and visualize three fundamental statistical measures: mean, variance, and standard deviation.
These concepts are key for summarizing and understanding data in fields like data science and analytics.
We'll use a practical example — daily sales numbers — to walk through how to compute these statistics and visualize the results clearly.
Step 1: Define the Dataset
Here, we assign an array to the variable data
to ensure we have a consistent dataset to work with for all calculations.
import numpy as np
# Create a numpy array of daily sales
data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16])
Step 2: Calculate Population Statistics
This function takes the array as input and returns the average value of all elements, which summarizes the central tendency of the dataset.
mean_val = np.mean(data) # Mean
variance_val = np.var(data) # Population variance (ddof=0 by default)
std_dev_val = np.std(data) # Population standard deviation
np.mean(data)
computes the arithmetic mean (average).np.var(data)
calculates the population variance (divides by $n$).np.std(data)
calculates the population standard deviation (square root of variance).
123456789101112import numpy as np # Create a numpy array of daily sales data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16]) mean_val = np.mean(data) # Mean variance_val = np.var(data) # Population variance (ddof=0 by default) std_dev_val = np.std(data) # Population standard deviation print(f"Mean: {mean_val}") print(f"Variance (Population): {variance_val}") print(f"Standard Deviation (Population): {std_dev_val}")
Step 3: Calculate Sample Statistics
To get unbiased estimates from a sample, we use ddof=1
.
This applies Bessel's correction, dividing variance by $(n-1)$ instead of $n$.
sample_variance_val = np.var(data, ddof=1)
sample_std_dev_val = np.std(data, ddof=1)
np.var(data, ddof=1)
→ sample variance;np.std(data, ddof=1)
→ sample standard deviation.
12345678910import numpy as np # Create a numpy array of daily sales data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16]) sample_variance_val = np.var(data, ddof=1) sample_std_dev_val = np.std(data, ddof=1) print(f"Variance (Sample): {sample_variance_val}") print(f"Standard Deviation (Sample): {sample_std_dev_val}")
Standard deviation is the square root of variance, giving a measure of spread in the same units as the original data, making it easier to interpret.
1. Which numpy
function calculates the average of data?
2. What does setting ddof=1
in numpy
's variance function do?
3. How do we calculate standard deviation with numpy
library?
Kiitos palautteestasi!
Kysy tekoälyä
Kysy tekoälyä
Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme
Awesome!
Completion rate improved to 1.89
Implementing Spread in Python
Pyyhkäise näyttääksesi valikon
In this notebook, you'll learn how to calculate and visualize three fundamental statistical measures: mean, variance, and standard deviation.
These concepts are key for summarizing and understanding data in fields like data science and analytics.
We'll use a practical example — daily sales numbers — to walk through how to compute these statistics and visualize the results clearly.
Step 1: Define the Dataset
Here, we assign an array to the variable data
to ensure we have a consistent dataset to work with for all calculations.
import numpy as np
# Create a numpy array of daily sales
data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16])
Step 2: Calculate Population Statistics
This function takes the array as input and returns the average value of all elements, which summarizes the central tendency of the dataset.
mean_val = np.mean(data) # Mean
variance_val = np.var(data) # Population variance (ddof=0 by default)
std_dev_val = np.std(data) # Population standard deviation
np.mean(data)
computes the arithmetic mean (average).np.var(data)
calculates the population variance (divides by $n$).np.std(data)
calculates the population standard deviation (square root of variance).
123456789101112import numpy as np # Create a numpy array of daily sales data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16]) mean_val = np.mean(data) # Mean variance_val = np.var(data) # Population variance (ddof=0 by default) std_dev_val = np.std(data) # Population standard deviation print(f"Mean: {mean_val}") print(f"Variance (Population): {variance_val}") print(f"Standard Deviation (Population): {std_dev_val}")
Step 3: Calculate Sample Statistics
To get unbiased estimates from a sample, we use ddof=1
.
This applies Bessel's correction, dividing variance by $(n-1)$ instead of $n$.
sample_variance_val = np.var(data, ddof=1)
sample_std_dev_val = np.std(data, ddof=1)
np.var(data, ddof=1)
→ sample variance;np.std(data, ddof=1)
→ sample standard deviation.
12345678910import numpy as np # Create a numpy array of daily sales data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16]) sample_variance_val = np.var(data, ddof=1) sample_std_dev_val = np.std(data, ddof=1) print(f"Variance (Sample): {sample_variance_val}") print(f"Standard Deviation (Sample): {sample_std_dev_val}")
Standard deviation is the square root of variance, giving a measure of spread in the same units as the original data, making it easier to interpret.
1. Which numpy
function calculates the average of data?
2. What does setting ddof=1
in numpy
's variance function do?
3. How do we calculate standard deviation with numpy
library?
Kiitos palautteestasi!