Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
What is Correlation? | Covariance and Correlation
Probability Theory Basics
course content

Course Content

Probability Theory Basics

Probability Theory Basics

1. Basic Concepts of Probability Theory
2. Probability of Complex Events
3. Commonly Used Discrete Distributions
4. Commonly Used Continuous Distributions
5. Covariance and Correlation

bookWhat is Correlation?

Correlation is a statistical measure that quantifies the relationship between two variables. It is determined as the scaled covariation and due to this scale, we can determine the measure of dependencies in addition to their direction.
Correlation ranges between -1 and 1, where:

  1. If the correlation is +1 then values have a perfect positive linear relationship. As one variable increases, the other variable increases proportionally;
  2. If the correlation is -1 then values have a perfect negative linear relationship. As one variable increases, the other variable decreases proportionally;
  3. If the correlation coefficient is close to 0 then there is no linear relationship between the variables.

To calculate the correlation we can follow the same steps as to calculate covariance and use np.corrcoef(x, y)[0, 1].

123456789101112131415161718192021222324252627
import matplotlib.pyplot as plt import numpy as np # Create a figure with three subplots fig, axes = plt.subplots(1, 3) fig.set_size_inches(10, 5) # Positive linear dependence x = np.random.rand(100) * 10 # Generate random x values y = x + np.random.randn(100) # Generate y values with added noise axes[0].scatter(x, y) # Scatter plot of x and y axes[0].set_title('Correlation is '+ str(round(np.corrcoef(x, y)[0, 1], 3) )) # Set title with correlation coefficient # Negative linear dependence x = np.random.rand(100) * 10 # Generate random x values y = -x + np.random.randn(100) # Generate y values with added noise axes[1].scatter(x, y) # Scatter plot of x and y axes[1].set_title('Correlation is '+ str(round(np.corrcoef(x, y)[0, 1], 3) )) # Set title with correlation coefficient # Independent np.random.seed(0) # Set random seed for reproducibility x = np.random.rand(200) # Generate random x values y = np.random.rand(200) # Generate random y values axes[2].scatter(x, y) # Scatter plot of x and y axes[2].set_title('Correlation is '+ str(round(np.corrcoef(x, y)[0, 1], 3) )) # Set title with correlation coefficient plt.show() # Display the plot
copy

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 5. Chapter 2
some-alt