Course Content
Ultimate Visualization with Python
Ultimate Visualization with Python
Joint Plot
Joint plot is a rather unique plot, since it combines multiple plots. Basically, it has three elements by default:
- histogram on the top which represents the distribution of a certain variable;
- histogram on the right which represents the distribution of another variable;
- scatter plot in the middle which shows the relationship between these two variables.
Here is an example of a joint plot:
Data for the Joint Plot
seaborn
has a joinplot()
function which, similarly to countplot()
and kdeplot()
, has three most important parameters:
data
;x
;y
.
x
and y
parameters are the variables we are interested in (the right and top histogram respectively), they can either be array-like objects or the names of the columns of a DataFrame
(if we also set the data
parameter as a DataFrame
).
Let’s have a look at an example:
import seaborn as sns import matplotlib.pyplot as plt # Loading the dataset with data about three different iris flowers species iris_df = sns.load_dataset("iris") sns.jointplot(data=iris_df, x="sepal_length", y="sepal_width") plt.show()
We have just recreated the example we had at the beginning by setting a DataFrame
object for the data
parameter and the names of the columns for x
and y
.
Plot in the Middle
Another quite useful parameter is kind
which specifies the plot you have in the middle. 'scatter'
is its default value. Here are other possible plots: 'kde'
, 'hist'
, 'hex'
, 'reg'
, 'resid'
. Feel free to experiment with different plots:
import seaborn as sns import matplotlib.pyplot as plt # Loading the dataset with data about three different iris flowers species iris_df = sns.load_dataset("iris") sns.jointplot(data=iris_df, x="sepal_length", y="sepal_width", kind='reg') plt.show()
Plot Kinds
Although scatter plot is mostly used for the plot in the middle, here are some other plots just for information:
'reg'
creates a linear regression model fit along with the scatter plot, which is useful to check whether two variables are correlated;'resid'
plots the residuals of a linear regression (documentation);'hist'
creates a bivariate histogram (for two variables);'kde'
creates a KDE plot;'hex'
creates a hexbin plot. It's a scatter plot where hexagonal bins are used instead of individual data points, and the color of each bin indicates how many data points fall within it.
As usual, feel free to explore more parameters in the documentaion.
Swipe to show code editor
- Use the correct function to create a joint plot.
- Use
weather_df
as the data for the plot (the first argument). - Set the
'Boston'
column for the x-axis variable (the second argument). - Set the
'Seattle'
column for the y-axis variable (the third argument). - Set the plot in the middle to have a regression line (the rightmost argument).
Thanks for your feedback!
Joint Plot
Joint plot is a rather unique plot, since it combines multiple plots. Basically, it has three elements by default:
- histogram on the top which represents the distribution of a certain variable;
- histogram on the right which represents the distribution of another variable;
- scatter plot in the middle which shows the relationship between these two variables.
Here is an example of a joint plot:
Data for the Joint Plot
seaborn
has a joinplot()
function which, similarly to countplot()
and kdeplot()
, has three most important parameters:
data
;x
;y
.
x
and y
parameters are the variables we are interested in (the right and top histogram respectively), they can either be array-like objects or the names of the columns of a DataFrame
(if we also set the data
parameter as a DataFrame
).
Let’s have a look at an example:
import seaborn as sns import matplotlib.pyplot as plt # Loading the dataset with data about three different iris flowers species iris_df = sns.load_dataset("iris") sns.jointplot(data=iris_df, x="sepal_length", y="sepal_width") plt.show()
We have just recreated the example we had at the beginning by setting a DataFrame
object for the data
parameter and the names of the columns for x
and y
.
Plot in the Middle
Another quite useful parameter is kind
which specifies the plot you have in the middle. 'scatter'
is its default value. Here are other possible plots: 'kde'
, 'hist'
, 'hex'
, 'reg'
, 'resid'
. Feel free to experiment with different plots:
import seaborn as sns import matplotlib.pyplot as plt # Loading the dataset with data about three different iris flowers species iris_df = sns.load_dataset("iris") sns.jointplot(data=iris_df, x="sepal_length", y="sepal_width", kind='reg') plt.show()
Plot Kinds
Although scatter plot is mostly used for the plot in the middle, here are some other plots just for information:
'reg'
creates a linear regression model fit along with the scatter plot, which is useful to check whether two variables are correlated;'resid'
plots the residuals of a linear regression (documentation);'hist'
creates a bivariate histogram (for two variables);'kde'
creates a KDE plot;'hex'
creates a hexbin plot. It's a scatter plot where hexagonal bins are used instead of individual data points, and the color of each bin indicates how many data points fall within it.
As usual, feel free to explore more parameters in the documentaion.
Swipe to show code editor
- Use the correct function to create a joint plot.
- Use
weather_df
as the data for the plot (the first argument). - Set the
'Boston'
column for the x-axis variable (the second argument). - Set the
'Seattle'
column for the y-axis variable (the third argument). - Set the plot in the middle to have a regression line (the rightmost argument).
Thanks for your feedback!