Conteúdo do Curso
Ultimate Visualization with Python
Ultimate Visualization with Python
KDE Plot
Kernel density estimation (KDE) plot is a plot used to visualize the probability density function estimation. It is in a way similar to a histogram which we discussed in the previous section, however, the KDE plot is a continuous curve, not a set of bars, and is based on all of the data points rather than the intervals. Let’s have a look at an example of a KDE plot:
As you can see, here we have a histogram combined with a KDE plot (orange curve). This combination gives us a much clearer probability density function approximation than a single histogram.
With seaborn
creating a KDE plot is as simple as it gets, since there is a special kdeplot()
function. Its most important parameters data
, x
and y
work the same way as in the countplot()
function.
First Option
We can simply set only one of these parameters via passing a sequence of values. Here is an example to clarify everything:
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns url = 'https://staging-content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' # Loading the dataset with the average yearly temperatures in Boston and Seattle weather_df = pd.read_csv(url, index_col=0) # Creating a KDE plot setting only the data parameter sns.kdeplot(data=weather_df['Seattle'], fill=True) plt.show()
We only set the value for the data
parameter passing a Series
object and use the fill
parameter to fill in the area under the curve (it is not filled in by default).
Second Option
It is also possible to set a 2D object like a DataFrame
for data
and a column name (or a key if the data
is a dictionary) for x
(vertical orientation) or y
(horizontal orientation):
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns url = 'https://staging-content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Creating a KDE plot setting both the data and x parameters sns.kdeplot(data=weather_df, x='Seattle', fill=True) plt.show()
We achieved the same results passing the whole DataFrame
as the data
parameter and the column name for the x
parameter.
By the way, the KDE plot we created has a characteristic bell curve and closely resembles the normal distribution with the mean of approximately 52°F.
In case you want to explore more about the kdeplot()
function, feel free to refer to its documentation.
Swipe to show code editor
- Use the correct function to create a KDE plot.
- Use
countries_df
as the data for the plot (the first argument). - Set
'GDP per capita'
as the column to use and the orientation to horizontal via the second argument. - Fill in the area under the curve via the third (rightmost) argument.
Obrigado pelo seu feedback!
KDE Plot
Kernel density estimation (KDE) plot is a plot used to visualize the probability density function estimation. It is in a way similar to a histogram which we discussed in the previous section, however, the KDE plot is a continuous curve, not a set of bars, and is based on all of the data points rather than the intervals. Let’s have a look at an example of a KDE plot:
As you can see, here we have a histogram combined with a KDE plot (orange curve). This combination gives us a much clearer probability density function approximation than a single histogram.
With seaborn
creating a KDE plot is as simple as it gets, since there is a special kdeplot()
function. Its most important parameters data
, x
and y
work the same way as in the countplot()
function.
First Option
We can simply set only one of these parameters via passing a sequence of values. Here is an example to clarify everything:
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns url = 'https://staging-content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' # Loading the dataset with the average yearly temperatures in Boston and Seattle weather_df = pd.read_csv(url, index_col=0) # Creating a KDE plot setting only the data parameter sns.kdeplot(data=weather_df['Seattle'], fill=True) plt.show()
We only set the value for the data
parameter passing a Series
object and use the fill
parameter to fill in the area under the curve (it is not filled in by default).
Second Option
It is also possible to set a 2D object like a DataFrame
for data
and a column name (or a key if the data
is a dictionary) for x
(vertical orientation) or y
(horizontal orientation):
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns url = 'https://staging-content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Creating a KDE plot setting both the data and x parameters sns.kdeplot(data=weather_df, x='Seattle', fill=True) plt.show()
We achieved the same results passing the whole DataFrame
as the data
parameter and the column name for the x
parameter.
By the way, the KDE plot we created has a characteristic bell curve and closely resembles the normal distribution with the mean of approximately 52°F.
In case you want to explore more about the kdeplot()
function, feel free to refer to its documentation.
Swipe to show code editor
- Use the correct function to create a KDE plot.
- Use
countries_df
as the data for the plot (the first argument). - Set
'GDP per capita'
as the column to use and the orientation to horizontal via the second argument. - Fill in the area under the curve via the third (rightmost) argument.
Obrigado pelo seu feedback!