Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Train-split evaluation | Building and Training Model
Explore the Linear Regression Using Python
course content

Course Content

Explore the Linear Regression Using Python

Explore the Linear Regression Using Python

1. What is the Linear Regression?
2. Correlation
3. Building and Training Model
4. Metrics to Evaluate the Model
5. Multivariate Linear Regression

Train-split evaluation

How to build a model to predict future values? In this section, we will work with sklearn to develop and train our model. First, import LinearRegression() to create the linear regression class:

12
from sklearn.linear_model import LinearRegression model = LinearRegression()
copy

We have initialized the model we will work with. Second, we have to split the data. We will use the train-test split technique for evaluating the method of a machine learning algorithm. Split the data into 2 categories:

  • Train Dataset: Used to train our model.
  • Test Dataset: Used to evaluate the fitted model.

The first set is used to find the model, while the second subset is used for predictions and comparison with expected values. Although, when you have a small dataset, this procedure shouldn't be used.

The function we will be using has one main configuration parameter - the percentage (from 0 to 1) of the data that is used for training or testing. For example, a training set of size 0.8 (80%) means that the remaining percentage of 0.2 (20%) goes to the test set. <strong>There is no optimal rule for the split percentage</strong>, it depends on goals, computational costs, set representativeness, and other factors, but it’s good to split data 70-30 (70% of data for training and 30% - for testing).

We will work in this section with train_test_split() function. It takes the dataset (x and y), the size of the test/train data, and returns it as output 2 subsets:

123
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3) print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
copy

wine.target is just an attribute name for the load_wine class that we imported from sklearn.datasets. This attribute gives the values of the dataset we are trying to predict.

The rows are randomly assigned to sets. This happens so that the datasets are representative samples (e.g., a random sample) of the original data set. When comparing algorithms, it is sometimes important that they fit and evaluate on the same subsets. To do this, it is desirable to fix the initial value for the pseudo-random number generator using the function parameter random_state for the above-described method.

Task

Try to split your wine dataset.

  1. [Line #8] Load the wine dataset.
  2. [Line #17] Set a target using method .target. In this case it’s flavanoids.
  3. [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a random parameter.
  4. [Line #28] Print the variable Y_train.

Task

Try to split your wine dataset.

  1. [Line #8] Load the wine dataset.
  2. [Line #17] Set a target using method .target. In this case it’s flavanoids.
  3. [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a random parameter.
  4. [Line #28] Print the variable Y_train.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 3. Chapter 1
toggle bottom row

Train-split evaluation

How to build a model to predict future values? In this section, we will work with sklearn to develop and train our model. First, import LinearRegression() to create the linear regression class:

12
from sklearn.linear_model import LinearRegression model = LinearRegression()
copy

We have initialized the model we will work with. Second, we have to split the data. We will use the train-test split technique for evaluating the method of a machine learning algorithm. Split the data into 2 categories:

  • Train Dataset: Used to train our model.
  • Test Dataset: Used to evaluate the fitted model.

The first set is used to find the model, while the second subset is used for predictions and comparison with expected values. Although, when you have a small dataset, this procedure shouldn't be used.

The function we will be using has one main configuration parameter - the percentage (from 0 to 1) of the data that is used for training or testing. For example, a training set of size 0.8 (80%) means that the remaining percentage of 0.2 (20%) goes to the test set. <strong>There is no optimal rule for the split percentage</strong>, it depends on goals, computational costs, set representativeness, and other factors, but it’s good to split data 70-30 (70% of data for training and 30% - for testing).

We will work in this section with train_test_split() function. It takes the dataset (x and y), the size of the test/train data, and returns it as output 2 subsets:

123
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3) print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
copy

wine.target is just an attribute name for the load_wine class that we imported from sklearn.datasets. This attribute gives the values of the dataset we are trying to predict.

The rows are randomly assigned to sets. This happens so that the datasets are representative samples (e.g., a random sample) of the original data set. When comparing algorithms, it is sometimes important that they fit and evaluate on the same subsets. To do this, it is desirable to fix the initial value for the pseudo-random number generator using the function parameter random_state for the above-described method.

Task

Try to split your wine dataset.

  1. [Line #8] Load the wine dataset.
  2. [Line #17] Set a target using method .target. In this case it’s flavanoids.
  3. [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a random parameter.
  4. [Line #28] Print the variable Y_train.

Task

Try to split your wine dataset.

  1. [Line #8] Load the wine dataset.
  2. [Line #17] Set a target using method .target. In this case it’s flavanoids.
  3. [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a random parameter.
  4. [Line #28] Print the variable Y_train.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 3. Chapter 1
toggle bottom row

Train-split evaluation

How to build a model to predict future values? In this section, we will work with sklearn to develop and train our model. First, import LinearRegression() to create the linear regression class:

12
from sklearn.linear_model import LinearRegression model = LinearRegression()
copy

We have initialized the model we will work with. Second, we have to split the data. We will use the train-test split technique for evaluating the method of a machine learning algorithm. Split the data into 2 categories:

  • Train Dataset: Used to train our model.
  • Test Dataset: Used to evaluate the fitted model.

The first set is used to find the model, while the second subset is used for predictions and comparison with expected values. Although, when you have a small dataset, this procedure shouldn't be used.

The function we will be using has one main configuration parameter - the percentage (from 0 to 1) of the data that is used for training or testing. For example, a training set of size 0.8 (80%) means that the remaining percentage of 0.2 (20%) goes to the test set. <strong>There is no optimal rule for the split percentage</strong>, it depends on goals, computational costs, set representativeness, and other factors, but it’s good to split data 70-30 (70% of data for training and 30% - for testing).

We will work in this section with train_test_split() function. It takes the dataset (x and y), the size of the test/train data, and returns it as output 2 subsets:

123
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3) print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
copy

wine.target is just an attribute name for the load_wine class that we imported from sklearn.datasets. This attribute gives the values of the dataset we are trying to predict.

The rows are randomly assigned to sets. This happens so that the datasets are representative samples (e.g., a random sample) of the original data set. When comparing algorithms, it is sometimes important that they fit and evaluate on the same subsets. To do this, it is desirable to fix the initial value for the pseudo-random number generator using the function parameter random_state for the above-described method.

Task

Try to split your wine dataset.

  1. [Line #8] Load the wine dataset.
  2. [Line #17] Set a target using method .target. In this case it’s flavanoids.
  3. [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a random parameter.
  4. [Line #28] Print the variable Y_train.

Task

Try to split your wine dataset.

  1. [Line #8] Load the wine dataset.
  2. [Line #17] Set a target using method .target. In this case it’s flavanoids.
  3. [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a random parameter.
  4. [Line #28] Print the variable Y_train.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

How to build a model to predict future values? In this section, we will work with sklearn to develop and train our model. First, import LinearRegression() to create the linear regression class:

12
from sklearn.linear_model import LinearRegression model = LinearRegression()
copy

We have initialized the model we will work with. Second, we have to split the data. We will use the train-test split technique for evaluating the method of a machine learning algorithm. Split the data into 2 categories:

  • Train Dataset: Used to train our model.
  • Test Dataset: Used to evaluate the fitted model.

The first set is used to find the model, while the second subset is used for predictions and comparison with expected values. Although, when you have a small dataset, this procedure shouldn't be used.

The function we will be using has one main configuration parameter - the percentage (from 0 to 1) of the data that is used for training or testing. For example, a training set of size 0.8 (80%) means that the remaining percentage of 0.2 (20%) goes to the test set. <strong>There is no optimal rule for the split percentage</strong>, it depends on goals, computational costs, set representativeness, and other factors, but it’s good to split data 70-30 (70% of data for training and 30% - for testing).

We will work in this section with train_test_split() function. It takes the dataset (x and y), the size of the test/train data, and returns it as output 2 subsets:

123
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3) print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
copy

wine.target is just an attribute name for the load_wine class that we imported from sklearn.datasets. This attribute gives the values of the dataset we are trying to predict.

The rows are randomly assigned to sets. This happens so that the datasets are representative samples (e.g., a random sample) of the original data set. When comparing algorithms, it is sometimes important that they fit and evaluate on the same subsets. To do this, it is desirable to fix the initial value for the pseudo-random number generator using the function parameter random_state for the above-described method.

Task

Try to split your wine dataset.

  1. [Line #8] Load the wine dataset.
  2. [Line #17] Set a target using method .target. In this case it’s flavanoids.
  3. [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a random parameter.
  4. [Line #28] Print the variable Y_train.

Switch to desktop for real-world practiceContinue from where you are using one of the options below
Section 3. Chapter 1
Switch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt