Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Histogram | More Statistical Plots
Ultimate Visualization with Python

Swipe to show menu

book
Histogram

Note
Definition

Histograms represent the frequency or probability distribution of a variable by using vertical bins of equal width, often referred to as bars.

The pyplot module provides the hist function to create histograms. The required parameter is the data (x), which can be an array or a sequence of arrays. If multiple arrays are passed, each is shown in a different color.

12345678910
import pandas as pd import matplotlib.pyplot as plt # Loading the dataset with the average yearly temperatures in Boston and Seattle url = 'https://staging-content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Creating a histogram plt.hist(weather_df['Seattle']) plt.show()
copy

Intervals and Height

A Series object containing average yearly temperatures in Seattle was passed to the hist() function. By default, the data is divided into 10 equal intervals ranging from the minimum to the maximum value. However, only 9 bins are visible because the second interval contains no data points.

The height of each bin by default is equal to the frequency of the values in this interval (number of times they occur).

Number of Bins

Another important, yet optional parameter is bins which takes either the number of bins (integer) or a sequence of numbers specifying the edges of the bins or a string. Most of the time passing the number of bins is more than enough.

There are several methods for determining the width of histogram bins. In this example, we'll use Sturges' formula, which calculates the optimal number of bins based on the sample size:

Here, n is the size of the data array.

Note
Study More

You can explore additional methods for bin calculation here.

12345678910
import pandas as pd import matplotlib.pyplot as plt import numpy as np url = 'https://staging-content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Specifying the number of bins plt.hist(weather_df['Seattle'], bins=1 + int(np.log2(len(weather_df)))) plt.show()
copy

The number of rows in the DataFrame is 26 (the size of the Series), so the resulting number of bins is 5.

Probability Density Approximation

To view an approximation of the probability density, set the density parameter to True in the hist function.

Now, each bin's height is calculated using:

Height=mnΓ—w\text{Height} = \frac{m}{n \times w}

where:

  • nn - the total number of values in the dataset;

  • mm - the number of values in bin;

  • ww - width of the bin.

This ensures that the total area under the histogram is 1, which matches the key property of a probability density function (PDF).

12345678910
import pandas as pd import matplotlib.pyplot as plt import numpy as np url = 'https://staging-content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Making a histogram a probability density function approximation plt.hist(weather_df['Seattle'], bins=1 + int(np.log2(len(weather_df))), density=True) plt.show()
copy

This provides an approximation of the probability density function for the temperature data.

Note
Study More

If you want to explore more about the hist() parameters, you can refer to hist() documentation.

Task

Swipe to start coding

Create an approximation of a probability density function using a sample from the standard normal distribution:

  1. Use the correct function for creating a histogram.
  2. Use normal_sample as the data for the histogram.
  3. Specify the number of bins as the second argument using the Sturges' formula.
  4. Make the histogram an approximation of a probability density function via correctly specifying the rightmost argument.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 1
We're sorry to hear that something went wrong. What happened?

Ask AI

expand
ChatGPT

Ask anything or try one of the suggested questions to begin our chat

book
Histogram

Note
Definition

Histograms represent the frequency or probability distribution of a variable by using vertical bins of equal width, often referred to as bars.

The pyplot module provides the hist function to create histograms. The required parameter is the data (x), which can be an array or a sequence of arrays. If multiple arrays are passed, each is shown in a different color.

12345678910
import pandas as pd import matplotlib.pyplot as plt # Loading the dataset with the average yearly temperatures in Boston and Seattle url = 'https://staging-content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Creating a histogram plt.hist(weather_df['Seattle']) plt.show()
copy

Intervals and Height

A Series object containing average yearly temperatures in Seattle was passed to the hist() function. By default, the data is divided into 10 equal intervals ranging from the minimum to the maximum value. However, only 9 bins are visible because the second interval contains no data points.

The height of each bin by default is equal to the frequency of the values in this interval (number of times they occur).

Number of Bins

Another important, yet optional parameter is bins which takes either the number of bins (integer) or a sequence of numbers specifying the edges of the bins or a string. Most of the time passing the number of bins is more than enough.

There are several methods for determining the width of histogram bins. In this example, we'll use Sturges' formula, which calculates the optimal number of bins based on the sample size:

Here, n is the size of the data array.

Note
Study More

You can explore additional methods for bin calculation here.

12345678910
import pandas as pd import matplotlib.pyplot as plt import numpy as np url = 'https://staging-content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Specifying the number of bins plt.hist(weather_df['Seattle'], bins=1 + int(np.log2(len(weather_df)))) plt.show()
copy

The number of rows in the DataFrame is 26 (the size of the Series), so the resulting number of bins is 5.

Probability Density Approximation

To view an approximation of the probability density, set the density parameter to True in the hist function.

Now, each bin's height is calculated using:

Height=mnΓ—w\text{Height} = \frac{m}{n \times w}

where:

  • nn - the total number of values in the dataset;

  • mm - the number of values in bin;

  • ww - width of the bin.

This ensures that the total area under the histogram is 1, which matches the key property of a probability density function (PDF).

12345678910
import pandas as pd import matplotlib.pyplot as plt import numpy as np url = 'https://staging-content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Making a histogram a probability density function approximation plt.hist(weather_df['Seattle'], bins=1 + int(np.log2(len(weather_df))), density=True) plt.show()
copy

This provides an approximation of the probability density function for the temperature data.

Note
Study More

If you want to explore more about the hist() parameters, you can refer to hist() documentation.

Task

Swipe to start coding

Create an approximation of a probability density function using a sample from the standard normal distribution:

  1. Use the correct function for creating a histogram.
  2. Use normal_sample as the data for the histogram.
  3. Specify the number of bins as the second argument using the Sturges' formula.
  4. Make the histogram an approximation of a probability density function via correctly specifying the rightmost argument.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 1
Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt