Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Calculating the Pearson Coefficient Using NumPy and Pandas | Correlation
Explore the Linear Regression Using Python
course content

Course Content

Explore the Linear Regression Using Python

Explore the Linear Regression Using Python

1. What is the Linear Regression?
2. Correlation
3. Building and Training Model
4. Metrics to Evaluate the Model
5. Multivariate Linear Regression

Calculating the Pearson Coefficient Using NumPy and Pandas

Let's look at how we can calculate the correlation coefficient if our data's type is np.array. The library has many statistics routines which simplify the calculations. We will use the method np.corrcoef(). It works with 2 arrays of the same length of our data:

123456789
# Import the libraries import numpy as np # Define np.arrays x = np.array([1, 2, 3, 5, 7, 8, 10, 11, 13, 15]) y = np.array([2, 4, 7, 8, 10, 15, 20, 21, 23, 30]) # Find correlation r = np.corrcoef(x, y)
copy

This function returns the correlation matrix (2-dimensional array) of correlation coefficients. Here is a more convenient version of the array:

The upper right value corresponds to the correlation coefficient for y and x, while the lower-left value is the correlation coefficient for x and y. These values we will always need. The other ones are the correlation coefficients between x and x, y and y. They are always equal to one.

If you want just the Pearson coefficient between x and y use this:

1
print(np.corrcoef(x, y)[0,1])
copy

Pandas correlation calculations also has a function to calculate the correlation coefficient for two of the same length Series objects. You can use .corr() method:

12345678910
# Import the libraries import pandas as pd # Define series x = pd.Series([1, 2, 3, 5, 7, 8, 10, 11, 13, 15]) y = pd.Series([2, 4, 7, 8, 10, 15, 20, 21, 23, 30]) # Print correlation coeffitients print(x.corr(y)) print(y.corr(x))
copy

Task

You have the initial dataset of Abyssinian cats' weight and height (x and y arrays, respectively). Find the correlation coefficient between x and y using all functions we discussed in this chapter.

  1. [Lines #2-3] Import the pandas, numpy libraries.
  2. [Lines #10-11] Change the type of arrays to np.arrays, find the correlation coefficient between x and y.
  3. [Line #12] Print the correlation coefficient you have found in a such way.
  4. [Lines #15-16] Change the type of arrays to Pandas Series , find the correlation coefficient between x and y.
  5. [Line #17] Print the correlation coefficient you have found in a such way.

Task

You have the initial dataset of Abyssinian cats' weight and height (x and y arrays, respectively). Find the correlation coefficient between x and y using all functions we discussed in this chapter.

  1. [Lines #2-3] Import the pandas, numpy libraries.
  2. [Lines #10-11] Change the type of arrays to np.arrays, find the correlation coefficient between x and y.
  3. [Line #12] Print the correlation coefficient you have found in a such way.
  4. [Lines #15-16] Change the type of arrays to Pandas Series , find the correlation coefficient between x and y.
  5. [Line #17] Print the correlation coefficient you have found in a such way.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 2. Chapter 3
toggle bottom row

Calculating the Pearson Coefficient Using NumPy and Pandas

Let's look at how we can calculate the correlation coefficient if our data's type is np.array. The library has many statistics routines which simplify the calculations. We will use the method np.corrcoef(). It works with 2 arrays of the same length of our data:

123456789
# Import the libraries import numpy as np # Define np.arrays x = np.array([1, 2, 3, 5, 7, 8, 10, 11, 13, 15]) y = np.array([2, 4, 7, 8, 10, 15, 20, 21, 23, 30]) # Find correlation r = np.corrcoef(x, y)
copy

This function returns the correlation matrix (2-dimensional array) of correlation coefficients. Here is a more convenient version of the array:

The upper right value corresponds to the correlation coefficient for y and x, while the lower-left value is the correlation coefficient for x and y. These values we will always need. The other ones are the correlation coefficients between x and x, y and y. They are always equal to one.

If you want just the Pearson coefficient between x and y use this:

1
print(np.corrcoef(x, y)[0,1])
copy

Pandas correlation calculations also has a function to calculate the correlation coefficient for two of the same length Series objects. You can use .corr() method:

12345678910
# Import the libraries import pandas as pd # Define series x = pd.Series([1, 2, 3, 5, 7, 8, 10, 11, 13, 15]) y = pd.Series([2, 4, 7, 8, 10, 15, 20, 21, 23, 30]) # Print correlation coeffitients print(x.corr(y)) print(y.corr(x))
copy

Task

You have the initial dataset of Abyssinian cats' weight and height (x and y arrays, respectively). Find the correlation coefficient between x and y using all functions we discussed in this chapter.

  1. [Lines #2-3] Import the pandas, numpy libraries.
  2. [Lines #10-11] Change the type of arrays to np.arrays, find the correlation coefficient between x and y.
  3. [Line #12] Print the correlation coefficient you have found in a such way.
  4. [Lines #15-16] Change the type of arrays to Pandas Series , find the correlation coefficient between x and y.
  5. [Line #17] Print the correlation coefficient you have found in a such way.

Task

You have the initial dataset of Abyssinian cats' weight and height (x and y arrays, respectively). Find the correlation coefficient between x and y using all functions we discussed in this chapter.

  1. [Lines #2-3] Import the pandas, numpy libraries.
  2. [Lines #10-11] Change the type of arrays to np.arrays, find the correlation coefficient between x and y.
  3. [Line #12] Print the correlation coefficient you have found in a such way.
  4. [Lines #15-16] Change the type of arrays to Pandas Series , find the correlation coefficient between x and y.
  5. [Line #17] Print the correlation coefficient you have found in a such way.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 2. Chapter 3
toggle bottom row

Calculating the Pearson Coefficient Using NumPy and Pandas

Let's look at how we can calculate the correlation coefficient if our data's type is np.array. The library has many statistics routines which simplify the calculations. We will use the method np.corrcoef(). It works with 2 arrays of the same length of our data:

123456789
# Import the libraries import numpy as np # Define np.arrays x = np.array([1, 2, 3, 5, 7, 8, 10, 11, 13, 15]) y = np.array([2, 4, 7, 8, 10, 15, 20, 21, 23, 30]) # Find correlation r = np.corrcoef(x, y)
copy

This function returns the correlation matrix (2-dimensional array) of correlation coefficients. Here is a more convenient version of the array:

The upper right value corresponds to the correlation coefficient for y and x, while the lower-left value is the correlation coefficient for x and y. These values we will always need. The other ones are the correlation coefficients between x and x, y and y. They are always equal to one.

If you want just the Pearson coefficient between x and y use this:

1
print(np.corrcoef(x, y)[0,1])
copy

Pandas correlation calculations also has a function to calculate the correlation coefficient for two of the same length Series objects. You can use .corr() method:

12345678910
# Import the libraries import pandas as pd # Define series x = pd.Series([1, 2, 3, 5, 7, 8, 10, 11, 13, 15]) y = pd.Series([2, 4, 7, 8, 10, 15, 20, 21, 23, 30]) # Print correlation coeffitients print(x.corr(y)) print(y.corr(x))
copy

Task

You have the initial dataset of Abyssinian cats' weight and height (x and y arrays, respectively). Find the correlation coefficient between x and y using all functions we discussed in this chapter.

  1. [Lines #2-3] Import the pandas, numpy libraries.
  2. [Lines #10-11] Change the type of arrays to np.arrays, find the correlation coefficient between x and y.
  3. [Line #12] Print the correlation coefficient you have found in a such way.
  4. [Lines #15-16] Change the type of arrays to Pandas Series , find the correlation coefficient between x and y.
  5. [Line #17] Print the correlation coefficient you have found in a such way.

Task

You have the initial dataset of Abyssinian cats' weight and height (x and y arrays, respectively). Find the correlation coefficient between x and y using all functions we discussed in this chapter.

  1. [Lines #2-3] Import the pandas, numpy libraries.
  2. [Lines #10-11] Change the type of arrays to np.arrays, find the correlation coefficient between x and y.
  3. [Line #12] Print the correlation coefficient you have found in a such way.
  4. [Lines #15-16] Change the type of arrays to Pandas Series , find the correlation coefficient between x and y.
  5. [Line #17] Print the correlation coefficient you have found in a such way.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Let's look at how we can calculate the correlation coefficient if our data's type is np.array. The library has many statistics routines which simplify the calculations. We will use the method np.corrcoef(). It works with 2 arrays of the same length of our data:

123456789
# Import the libraries import numpy as np # Define np.arrays x = np.array([1, 2, 3, 5, 7, 8, 10, 11, 13, 15]) y = np.array([2, 4, 7, 8, 10, 15, 20, 21, 23, 30]) # Find correlation r = np.corrcoef(x, y)
copy

This function returns the correlation matrix (2-dimensional array) of correlation coefficients. Here is a more convenient version of the array:

The upper right value corresponds to the correlation coefficient for y and x, while the lower-left value is the correlation coefficient for x and y. These values we will always need. The other ones are the correlation coefficients between x and x, y and y. They are always equal to one.

If you want just the Pearson coefficient between x and y use this:

1
print(np.corrcoef(x, y)[0,1])
copy

Pandas correlation calculations also has a function to calculate the correlation coefficient for two of the same length Series objects. You can use .corr() method:

12345678910
# Import the libraries import pandas as pd # Define series x = pd.Series([1, 2, 3, 5, 7, 8, 10, 11, 13, 15]) y = pd.Series([2, 4, 7, 8, 10, 15, 20, 21, 23, 30]) # Print correlation coeffitients print(x.corr(y)) print(y.corr(x))
copy

Task

You have the initial dataset of Abyssinian cats' weight and height (x and y arrays, respectively). Find the correlation coefficient between x and y using all functions we discussed in this chapter.

  1. [Lines #2-3] Import the pandas, numpy libraries.
  2. [Lines #10-11] Change the type of arrays to np.arrays, find the correlation coefficient between x and y.
  3. [Line #12] Print the correlation coefficient you have found in a such way.
  4. [Lines #15-16] Change the type of arrays to Pandas Series , find the correlation coefficient between x and y.
  5. [Line #17] Print the correlation coefficient you have found in a such way.

Switch to desktop for real-world practiceContinue from where you are using one of the options below
Section 2. Chapter 3
Switch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt