Course Content
Explore the Linear Regression Using Python
Explore the Linear Regression Using Python
Train-split evaluation
How to build a model to predict future values? In this section, we will work with sklearn
to develop and train our model. First, import LinearRegression()
to create the linear regression class:
from sklearn.linear_model import LinearRegression model = LinearRegression()
We have initialized the model we will work with. Second, we have to split the data. We will use the train-test split technique for evaluating the method of a machine learning algorithm. Split the data into 2 categories:
- Train Dataset: Used to train our model.
- Test Dataset: Used to evaluate the fitted model.
The first set is used to find the model, while the second subset is used for predictions and comparison with expected values. Although, when you have a small dataset, this procedure shouldn't be used.
The function we will be using has one main configuration parameter - the percentage (from 0 to 1) of the data that is used for training or testing. For example, a training set of size 0.8 (80%) means that the remaining percentage of 0.2 (20%) goes to the test set. <strong class="go98639658">There is no optimal rule for the split percentage</strong>, it depends on goals, computational costs, set representativeness, and other factors, but it’s good to split data 70-30 (70% of data for training and 30% - for testing).
We will work in this section with train_test_split()
function. It takes the dataset (x
and y
), the size of the test/train data, and returns it as output 2 subsets:
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3) print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
wine.target is just an attribute name for the
load_wine
class that we imported fromsklearn.datasets
. This attribute gives the values of the dataset we are trying to predict.
The rows are randomly assigned to sets. This happens so that the datasets are representative samples (e.g., a random sample) of the original data set. When comparing algorithms, it is sometimes important that they fit and evaluate on the same subsets. To do this, it is desirable to fix the initial value for the pseudo-random number generator using the function parameter random_state
for the above-described method.
Task
Try to split your wine dataset.
- [Line #8] Load the wine dataset.
- [Line #17] Set a target using method
.target
. In this case it’sflavanoids
. - [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a
random parameter
. - [Line #28] Print the variable
Y_train
.
Thanks for your feedback!
Train-split evaluation
How to build a model to predict future values? In this section, we will work with sklearn
to develop and train our model. First, import LinearRegression()
to create the linear regression class:
from sklearn.linear_model import LinearRegression model = LinearRegression()
We have initialized the model we will work with. Second, we have to split the data. We will use the train-test split technique for evaluating the method of a machine learning algorithm. Split the data into 2 categories:
- Train Dataset: Used to train our model.
- Test Dataset: Used to evaluate the fitted model.
The first set is used to find the model, while the second subset is used for predictions and comparison with expected values. Although, when you have a small dataset, this procedure shouldn't be used.
The function we will be using has one main configuration parameter - the percentage (from 0 to 1) of the data that is used for training or testing. For example, a training set of size 0.8 (80%) means that the remaining percentage of 0.2 (20%) goes to the test set. <strong class="go98639658">There is no optimal rule for the split percentage</strong>, it depends on goals, computational costs, set representativeness, and other factors, but it’s good to split data 70-30 (70% of data for training and 30% - for testing).
We will work in this section with train_test_split()
function. It takes the dataset (x
and y
), the size of the test/train data, and returns it as output 2 subsets:
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3) print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
wine.target is just an attribute name for the
load_wine
class that we imported fromsklearn.datasets
. This attribute gives the values of the dataset we are trying to predict.
The rows are randomly assigned to sets. This happens so that the datasets are representative samples (e.g., a random sample) of the original data set. When comparing algorithms, it is sometimes important that they fit and evaluate on the same subsets. To do this, it is desirable to fix the initial value for the pseudo-random number generator using the function parameter random_state
for the above-described method.
Task
Try to split your wine dataset.
- [Line #8] Load the wine dataset.
- [Line #17] Set a target using method
.target
. In this case it’sflavanoids
. - [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a
random parameter
. - [Line #28] Print the variable
Y_train
.
Thanks for your feedback!
Train-split evaluation
How to build a model to predict future values? In this section, we will work with sklearn
to develop and train our model. First, import LinearRegression()
to create the linear regression class:
from sklearn.linear_model import LinearRegression model = LinearRegression()
We have initialized the model we will work with. Second, we have to split the data. We will use the train-test split technique for evaluating the method of a machine learning algorithm. Split the data into 2 categories:
- Train Dataset: Used to train our model.
- Test Dataset: Used to evaluate the fitted model.
The first set is used to find the model, while the second subset is used for predictions and comparison with expected values. Although, when you have a small dataset, this procedure shouldn't be used.
The function we will be using has one main configuration parameter - the percentage (from 0 to 1) of the data that is used for training or testing. For example, a training set of size 0.8 (80%) means that the remaining percentage of 0.2 (20%) goes to the test set. <strong class="go98639658">There is no optimal rule for the split percentage</strong>, it depends on goals, computational costs, set representativeness, and other factors, but it’s good to split data 70-30 (70% of data for training and 30% - for testing).
We will work in this section with train_test_split()
function. It takes the dataset (x
and y
), the size of the test/train data, and returns it as output 2 subsets:
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3) print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
wine.target is just an attribute name for the
load_wine
class that we imported fromsklearn.datasets
. This attribute gives the values of the dataset we are trying to predict.
The rows are randomly assigned to sets. This happens so that the datasets are representative samples (e.g., a random sample) of the original data set. When comparing algorithms, it is sometimes important that they fit and evaluate on the same subsets. To do this, it is desirable to fix the initial value for the pseudo-random number generator using the function parameter random_state
for the above-described method.
Task
Try to split your wine dataset.
- [Line #8] Load the wine dataset.
- [Line #17] Set a target using method
.target
. In this case it’sflavanoids
. - [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a
random parameter
. - [Line #28] Print the variable
Y_train
.
Thanks for your feedback!
How to build a model to predict future values? In this section, we will work with sklearn
to develop and train our model. First, import LinearRegression()
to create the linear regression class:
from sklearn.linear_model import LinearRegression model = LinearRegression()
We have initialized the model we will work with. Second, we have to split the data. We will use the train-test split technique for evaluating the method of a machine learning algorithm. Split the data into 2 categories:
- Train Dataset: Used to train our model.
- Test Dataset: Used to evaluate the fitted model.
The first set is used to find the model, while the second subset is used for predictions and comparison with expected values. Although, when you have a small dataset, this procedure shouldn't be used.
The function we will be using has one main configuration parameter - the percentage (from 0 to 1) of the data that is used for training or testing. For example, a training set of size 0.8 (80%) means that the remaining percentage of 0.2 (20%) goes to the test set. <strong class="go98639658">There is no optimal rule for the split percentage</strong>, it depends on goals, computational costs, set representativeness, and other factors, but it’s good to split data 70-30 (70% of data for training and 30% - for testing).
We will work in this section with train_test_split()
function. It takes the dataset (x
and y
), the size of the test/train data, and returns it as output 2 subsets:
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3) print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
wine.target is just an attribute name for the
load_wine
class that we imported fromsklearn.datasets
. This attribute gives the values of the dataset we are trying to predict.
The rows are randomly assigned to sets. This happens so that the datasets are representative samples (e.g., a random sample) of the original data set. When comparing algorithms, it is sometimes important that they fit and evaluate on the same subsets. To do this, it is desirable to fix the initial value for the pseudo-random number generator using the function parameter random_state
for the above-described method.
Task
Try to split your wine dataset.
- [Line #8] Load the wine dataset.
- [Line #17] Set a target using method
.target
. In this case it’sflavanoids
. - [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a
random parameter
. - [Line #28] Print the variable
Y_train
.