Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Implementing Spread in Python | Probability & Statistics
Mathematics for Data Science

bookImplementing Spread in Python

In this notebook, you'll learn how to calculate and visualize three fundamental statistical measures: mean, variance, and standard deviation.
These concepts are key for summarizing and understanding data in fields like data science and analytics.

We'll use a practical example — daily sales numbers — to walk through how to compute these statistics and visualize the results clearly.

Step 1: Define the Dataset

Here, we assign an array to the variable data to ensure we have a consistent dataset to work with for all calculations.

import numpy as np

# Create a numpy array of daily sales
data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16])

Step 2: Calculate Population Statistics

This function takes the array as input and returns the average value of all elements, which summarizes the central tendency of the dataset.

mean_val = np.mean(data)       # Mean
variance_val = np.var(data)    # Population variance (ddof=0 by default)
std_dev_val = np.std(data)     # Population standard deviation
  • np.mean(data) computes the arithmetic mean (average).
  • np.var(data) calculates the population variance (divides by $n$).
  • np.std(data) calculates the population standard deviation (square root of variance).
123456789101112
import numpy as np # Create a numpy array of daily sales data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16]) mean_val = np.mean(data) # Mean variance_val = np.var(data) # Population variance (ddof=0 by default) std_dev_val = np.std(data) # Population standard deviation print(f"Mean: {mean_val}") print(f"Variance (Population): {variance_val}") print(f"Standard Deviation (Population): {std_dev_val}")
copy

Step 3: Calculate Sample Statistics

To get unbiased estimates from a sample, we use ddof=1. This applies Bessel's correction, dividing variance by $(n-1)$ instead of $n$.

sample_variance_val = np.var(data, ddof=1)
sample_std_dev_val = np.std(data, ddof=1)
  • np.var(data, ddof=1) → sample variance;
  • np.std(data, ddof=1) → sample standard deviation.
12345678910
import numpy as np # Create a numpy array of daily sales data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16]) sample_variance_val = np.var(data, ddof=1) sample_std_dev_val = np.std(data, ddof=1) print(f"Variance (Sample): {sample_variance_val}") print(f"Standard Deviation (Sample): {sample_std_dev_val}")
copy
Note
Note

Standard deviation is the square root of variance, giving a measure of spread in the same units as the original data, making it easier to interpret.

1. Which numpy function calculates the average of data?

2. What does setting ddof=1 in numpy's variance function do?

3. How do we calculate standard deviation with numpy library?

question mark

Which numpy function calculates the average of data?

Select the correct answer

question mark

What does setting ddof=1 in numpy's variance function do?

Select the correct answer

question mark

How do we calculate standard deviation with numpy library?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 5. Kapitel 8

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Suggested prompts:

Can you explain the difference between population and sample statistics again?

Why do we use Bessel's correction (ddof=1) for sample statistics?

What do the calculated values tell us about the sales data?

Awesome!

Completion rate improved to 1.89

bookImplementing Spread in Python

Stryg for at vise menuen

In this notebook, you'll learn how to calculate and visualize three fundamental statistical measures: mean, variance, and standard deviation.
These concepts are key for summarizing and understanding data in fields like data science and analytics.

We'll use a practical example — daily sales numbers — to walk through how to compute these statistics and visualize the results clearly.

Step 1: Define the Dataset

Here, we assign an array to the variable data to ensure we have a consistent dataset to work with for all calculations.

import numpy as np

# Create a numpy array of daily sales
data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16])

Step 2: Calculate Population Statistics

This function takes the array as input and returns the average value of all elements, which summarizes the central tendency of the dataset.

mean_val = np.mean(data)       # Mean
variance_val = np.var(data)    # Population variance (ddof=0 by default)
std_dev_val = np.std(data)     # Population standard deviation
  • np.mean(data) computes the arithmetic mean (average).
  • np.var(data) calculates the population variance (divides by $n$).
  • np.std(data) calculates the population standard deviation (square root of variance).
123456789101112
import numpy as np # Create a numpy array of daily sales data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16]) mean_val = np.mean(data) # Mean variance_val = np.var(data) # Population variance (ddof=0 by default) std_dev_val = np.std(data) # Population standard deviation print(f"Mean: {mean_val}") print(f"Variance (Population): {variance_val}") print(f"Standard Deviation (Population): {std_dev_val}")
copy

Step 3: Calculate Sample Statistics

To get unbiased estimates from a sample, we use ddof=1. This applies Bessel's correction, dividing variance by $(n-1)$ instead of $n$.

sample_variance_val = np.var(data, ddof=1)
sample_std_dev_val = np.std(data, ddof=1)
  • np.var(data, ddof=1) → sample variance;
  • np.std(data, ddof=1) → sample standard deviation.
12345678910
import numpy as np # Create a numpy array of daily sales data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16]) sample_variance_val = np.var(data, ddof=1) sample_std_dev_val = np.std(data, ddof=1) print(f"Variance (Sample): {sample_variance_val}") print(f"Standard Deviation (Sample): {sample_std_dev_val}")
copy
Note
Note

Standard deviation is the square root of variance, giving a measure of spread in the same units as the original data, making it easier to interpret.

1. Which numpy function calculates the average of data?

2. What does setting ddof=1 in numpy's variance function do?

3. How do we calculate standard deviation with numpy library?

question mark

Which numpy function calculates the average of data?

Select the correct answer

question mark

What does setting ddof=1 in numpy's variance function do?

Select the correct answer

question mark

How do we calculate standard deviation with numpy library?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 5. Kapitel 8
some-alt