Course Content
Explore the Linear Regression Using Python
Explore the Linear Regression Using Python
Useful Functions for Researching and Visualizing Dataset
To get information on our dataset, we can also use the function .describe()
. This method shows the summary statistics of your dataset: mean, median, standard deviation, and so on. For example, we are adding this line to the previous code to show all these characteristics of wines:
print(data.describe())
If it is inconvenient for us to work with many digits, we can use method .round(n)
, where n is the number of decimal places to which you are rounding.
Numbers are great, but it's still not entirely clear what they represent. For this reason, we will use visualization to see what’s going on in our dataset. It is good to see the dataset's representation before building the regression line. The well-known library matplotlib.pyplot
is irreplaceable in this situation. To see all our data as a histogram, just use method .hist()
. To get a particular chart we are interested in, we can define it in parentheses. Moreover, we can set the bins by using the bins = argument
.
For example, here we will see the histogram of color intensity having 20 bins:
# Import the libraries import matplotlib.pyplot as plt import pandas as pd from sklearn.datasets import load_wine # Load dataset wine = load_wine() # Configure pandas to show all features pd.set_option('display.max_rows', None, 'display.max_columns', None) # Convert data to a dataframe to view properly data = pd.DataFrame(data = wine['data'], columns = wine['feature_names']) # Get all information about our dataset print(data.describe()) # Visualize the data data.hist(column = 'color_intensity',bins = 20) plt.show()
Output:
Task
Let’s see what's going on inside our set.
- [Line #7] Load wine set.
- [Line #16] Get and print all information about it using the function
.describe()
. - [Line #19] Visualize data about alcohol consistency in the set, setting
column = 'alcohol'
and definingbins = 15
.
Thanks for your feedback!
Useful Functions for Researching and Visualizing Dataset
To get information on our dataset, we can also use the function .describe()
. This method shows the summary statistics of your dataset: mean, median, standard deviation, and so on. For example, we are adding this line to the previous code to show all these characteristics of wines:
print(data.describe())
If it is inconvenient for us to work with many digits, we can use method .round(n)
, where n is the number of decimal places to which you are rounding.
Numbers are great, but it's still not entirely clear what they represent. For this reason, we will use visualization to see what’s going on in our dataset. It is good to see the dataset's representation before building the regression line. The well-known library matplotlib.pyplot
is irreplaceable in this situation. To see all our data as a histogram, just use method .hist()
. To get a particular chart we are interested in, we can define it in parentheses. Moreover, we can set the bins by using the bins = argument
.
For example, here we will see the histogram of color intensity having 20 bins:
# Import the libraries import matplotlib.pyplot as plt import pandas as pd from sklearn.datasets import load_wine # Load dataset wine = load_wine() # Configure pandas to show all features pd.set_option('display.max_rows', None, 'display.max_columns', None) # Convert data to a dataframe to view properly data = pd.DataFrame(data = wine['data'], columns = wine['feature_names']) # Get all information about our dataset print(data.describe()) # Visualize the data data.hist(column = 'color_intensity',bins = 20) plt.show()
Output:
Task
Let’s see what's going on inside our set.
- [Line #7] Load wine set.
- [Line #16] Get and print all information about it using the function
.describe()
. - [Line #19] Visualize data about alcohol consistency in the set, setting
column = 'alcohol'
and definingbins = 15
.
Thanks for your feedback!
Useful Functions for Researching and Visualizing Dataset
To get information on our dataset, we can also use the function .describe()
. This method shows the summary statistics of your dataset: mean, median, standard deviation, and so on. For example, we are adding this line to the previous code to show all these characteristics of wines:
print(data.describe())
If it is inconvenient for us to work with many digits, we can use method .round(n)
, where n is the number of decimal places to which you are rounding.
Numbers are great, but it's still not entirely clear what they represent. For this reason, we will use visualization to see what’s going on in our dataset. It is good to see the dataset's representation before building the regression line. The well-known library matplotlib.pyplot
is irreplaceable in this situation. To see all our data as a histogram, just use method .hist()
. To get a particular chart we are interested in, we can define it in parentheses. Moreover, we can set the bins by using the bins = argument
.
For example, here we will see the histogram of color intensity having 20 bins:
# Import the libraries import matplotlib.pyplot as plt import pandas as pd from sklearn.datasets import load_wine # Load dataset wine = load_wine() # Configure pandas to show all features pd.set_option('display.max_rows', None, 'display.max_columns', None) # Convert data to a dataframe to view properly data = pd.DataFrame(data = wine['data'], columns = wine['feature_names']) # Get all information about our dataset print(data.describe()) # Visualize the data data.hist(column = 'color_intensity',bins = 20) plt.show()
Output:
Task
Let’s see what's going on inside our set.
- [Line #7] Load wine set.
- [Line #16] Get and print all information about it using the function
.describe()
. - [Line #19] Visualize data about alcohol consistency in the set, setting
column = 'alcohol'
and definingbins = 15
.
Thanks for your feedback!
To get information on our dataset, we can also use the function .describe()
. This method shows the summary statistics of your dataset: mean, median, standard deviation, and so on. For example, we are adding this line to the previous code to show all these characteristics of wines:
print(data.describe())
If it is inconvenient for us to work with many digits, we can use method .round(n)
, where n is the number of decimal places to which you are rounding.
Numbers are great, but it's still not entirely clear what they represent. For this reason, we will use visualization to see what’s going on in our dataset. It is good to see the dataset's representation before building the regression line. The well-known library matplotlib.pyplot
is irreplaceable in this situation. To see all our data as a histogram, just use method .hist()
. To get a particular chart we are interested in, we can define it in parentheses. Moreover, we can set the bins by using the bins = argument
.
For example, here we will see the histogram of color intensity having 20 bins:
# Import the libraries import matplotlib.pyplot as plt import pandas as pd from sklearn.datasets import load_wine # Load dataset wine = load_wine() # Configure pandas to show all features pd.set_option('display.max_rows', None, 'display.max_columns', None) # Convert data to a dataframe to view properly data = pd.DataFrame(data = wine['data'], columns = wine['feature_names']) # Get all information about our dataset print(data.describe()) # Visualize the data data.hist(column = 'color_intensity',bins = 20) plt.show()
Output:
Task
Let’s see what's going on inside our set.
- [Line #7] Load wine set.
- [Line #16] Get and print all information about it using the function
.describe()
. - [Line #19] Visualize data about alcohol consistency in the set, setting
column = 'alcohol'
and definingbins = 15
.