Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Calculate Variance with Python | Variance and Standard Deviation
Learning Statistics with Python
course content

Course Content

Learning Statistics with Python

Learning Statistics with Python

1. Basic Concepts
2. Mean, Median and Mode with Python
3. Variance and Standard Deviation
4. Covariance vs Correlation
5. Confidence Interval
6. Statistical Testing

bookCalculate Variance with Python

Calculating Variance with NumPy

In NumPy, you need to input the sequence of values (in our case, the column of the dataset) into the np.var() function, like this: np.var(df['work_year']).

Calculating Variance with pandas

In pandas, you should use the .var() method on the sequence of values (in our case, the column of the dataset), like this: df['work_year'].var().

In both cases, the results are almost the same. The differences are due to different denominators: N in NumPy, and N-1 in pandas. Check it now!

123456789101112
import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/update/ds_salaries_statistics', index_col = 0) # Calculate the variance using the function from the NumPy library var_1 = np.var(df['salary_in_usd']) # Calculate the variance using the function from the pandas library var_2 = df['salary_in_usd'].var() print('The variace using NumPy library is', var_1) print('The variace using pandas library is', var_2)
copy

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 3. Chapter 3
some-alt