Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Train-split evaluation | Building and Training Model
Explore the Linear Regression Using Python
course content

Зміст курсу

Explore the Linear Regression Using Python

Explore the Linear Regression Using Python

1. What is the Linear Regression?
2. Correlation
3. Building and Training Model
4. Metrics to Evaluate the Model
5. Multivariate Linear Regression

bookTrain-split evaluation

How to build a model to predict future values? In this section, we will work with sklearn to develop and train our model. First, import LinearRegression() to create the linear regression class:

12
from sklearn.linear_model import LinearRegression model = LinearRegression()
copy

We have initialized the model we will work with. Second, we have to split the data. We will use the train-test split technique for evaluating the method of a machine learning algorithm. Split the data into 2 categories:

  • Train Dataset: Used to train our model.
  • Test Dataset: Used to evaluate the fitted model.

The first set is used to find the model, while the second subset is used for predictions and comparison with expected values. Although, when you have a small dataset, this procedure shouldn't be used.

The function we will be using has one main configuration parameter - the percentage (from 0 to 1) of the data that is used for training or testing. For example, a training set of size 0.8 (80%) means that the remaining percentage of 0.2 (20%) goes to the test set. <strong class="go98639658">There is no optimal rule for the split percentage</strong>, it depends on goals, computational costs, set representativeness, and other factors, but it’s good to split data 70-30 (70% of data for training and 30% - for testing).

We will work in this section with train_test_split() function. It takes the dataset (x and y), the size of the test/train data, and returns it as output 2 subsets:

123
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3) print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
copy

wine.target is just an attribute name for the load_wine class that we imported from sklearn.datasets. This attribute gives the values of the dataset we are trying to predict.

The rows are randomly assigned to sets. This happens so that the datasets are representative samples (e.g., a random sample) of the original data set. When comparing algorithms, it is sometimes important that they fit and evaluate on the same subsets. To do this, it is desirable to fix the initial value for the pseudo-random number generator using the function parameter random_state for the above-described method.

Завдання

Try to split your wine dataset.

  1. [Line #8] Load the wine dataset.
  2. [Line #17] Set a target using method .target. In this case it’s flavanoids.
  3. [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a random parameter.
  4. [Line #28] Print the variable Y_train.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 1
toggle bottom row

bookTrain-split evaluation

How to build a model to predict future values? In this section, we will work with sklearn to develop and train our model. First, import LinearRegression() to create the linear regression class:

12
from sklearn.linear_model import LinearRegression model = LinearRegression()
copy

We have initialized the model we will work with. Second, we have to split the data. We will use the train-test split technique for evaluating the method of a machine learning algorithm. Split the data into 2 categories:

  • Train Dataset: Used to train our model.
  • Test Dataset: Used to evaluate the fitted model.

The first set is used to find the model, while the second subset is used for predictions and comparison with expected values. Although, when you have a small dataset, this procedure shouldn't be used.

The function we will be using has one main configuration parameter - the percentage (from 0 to 1) of the data that is used for training or testing. For example, a training set of size 0.8 (80%) means that the remaining percentage of 0.2 (20%) goes to the test set. <strong class="go98639658">There is no optimal rule for the split percentage</strong>, it depends on goals, computational costs, set representativeness, and other factors, but it’s good to split data 70-30 (70% of data for training and 30% - for testing).

We will work in this section with train_test_split() function. It takes the dataset (x and y), the size of the test/train data, and returns it as output 2 subsets:

123
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3) print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
copy

wine.target is just an attribute name for the load_wine class that we imported from sklearn.datasets. This attribute gives the values of the dataset we are trying to predict.

The rows are randomly assigned to sets. This happens so that the datasets are representative samples (e.g., a random sample) of the original data set. When comparing algorithms, it is sometimes important that they fit and evaluate on the same subsets. To do this, it is desirable to fix the initial value for the pseudo-random number generator using the function parameter random_state for the above-described method.

Завдання

Try to split your wine dataset.

  1. [Line #8] Load the wine dataset.
  2. [Line #17] Set a target using method .target. In this case it’s flavanoids.
  3. [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a random parameter.
  4. [Line #28] Print the variable Y_train.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 1
toggle bottom row

bookTrain-split evaluation

How to build a model to predict future values? In this section, we will work with sklearn to develop and train our model. First, import LinearRegression() to create the linear regression class:

12
from sklearn.linear_model import LinearRegression model = LinearRegression()
copy

We have initialized the model we will work with. Second, we have to split the data. We will use the train-test split technique for evaluating the method of a machine learning algorithm. Split the data into 2 categories:

  • Train Dataset: Used to train our model.
  • Test Dataset: Used to evaluate the fitted model.

The first set is used to find the model, while the second subset is used for predictions and comparison with expected values. Although, when you have a small dataset, this procedure shouldn't be used.

The function we will be using has one main configuration parameter - the percentage (from 0 to 1) of the data that is used for training or testing. For example, a training set of size 0.8 (80%) means that the remaining percentage of 0.2 (20%) goes to the test set. <strong class="go98639658">There is no optimal rule for the split percentage</strong>, it depends on goals, computational costs, set representativeness, and other factors, but it’s good to split data 70-30 (70% of data for training and 30% - for testing).

We will work in this section with train_test_split() function. It takes the dataset (x and y), the size of the test/train data, and returns it as output 2 subsets:

123
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3) print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
copy

wine.target is just an attribute name for the load_wine class that we imported from sklearn.datasets. This attribute gives the values of the dataset we are trying to predict.

The rows are randomly assigned to sets. This happens so that the datasets are representative samples (e.g., a random sample) of the original data set. When comparing algorithms, it is sometimes important that they fit and evaluate on the same subsets. To do this, it is desirable to fix the initial value for the pseudo-random number generator using the function parameter random_state for the above-described method.

Завдання

Try to split your wine dataset.

  1. [Line #8] Load the wine dataset.
  2. [Line #17] Set a target using method .target. In this case it’s flavanoids.
  3. [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a random parameter.
  4. [Line #28] Print the variable Y_train.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

How to build a model to predict future values? In this section, we will work with sklearn to develop and train our model. First, import LinearRegression() to create the linear regression class:

12
from sklearn.linear_model import LinearRegression model = LinearRegression()
copy

We have initialized the model we will work with. Second, we have to split the data. We will use the train-test split technique for evaluating the method of a machine learning algorithm. Split the data into 2 categories:

  • Train Dataset: Used to train our model.
  • Test Dataset: Used to evaluate the fitted model.

The first set is used to find the model, while the second subset is used for predictions and comparison with expected values. Although, when you have a small dataset, this procedure shouldn't be used.

The function we will be using has one main configuration parameter - the percentage (from 0 to 1) of the data that is used for training or testing. For example, a training set of size 0.8 (80%) means that the remaining percentage of 0.2 (20%) goes to the test set. <strong class="go98639658">There is no optimal rule for the split percentage</strong>, it depends on goals, computational costs, set representativeness, and other factors, but it’s good to split data 70-30 (70% of data for training and 30% - for testing).

We will work in this section with train_test_split() function. It takes the dataset (x and y), the size of the test/train data, and returns it as output 2 subsets:

123
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3) print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
copy

wine.target is just an attribute name for the load_wine class that we imported from sklearn.datasets. This attribute gives the values of the dataset we are trying to predict.

The rows are randomly assigned to sets. This happens so that the datasets are representative samples (e.g., a random sample) of the original data set. When comparing algorithms, it is sometimes important that they fit and evaluate on the same subsets. To do this, it is desirable to fix the initial value for the pseudo-random number generator using the function parameter random_state for the above-described method.

Завдання

Try to split your wine dataset.

  1. [Line #8] Load the wine dataset.
  2. [Line #17] Set a target using method .target. In this case it’s flavanoids.
  3. [Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a random parameter.
  4. [Line #28] Print the variable Y_train.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Секція 3. Розділ 1
Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
some-alt