Вивчайте Train-split evaluation | Building and Training Model

Секція 3. Розділ 1

single

Свайпніть щоб показати меню

How to build a model to predict future values? In this section, we will work with sklearn to develop and train our model. First, import LinearRegression() to create the linear regression class:


              12
            
from sklearn.linear_model import LinearRegression
model = LinearRegression()

We have initialized the model we will work with. Second, we have to split the data. We will use the train-test split technique for evaluating the method of a machine learning algorithm. Split the data into 2 categories:

Train Dataset: Used to train our model.
Test Dataset: Used to evaluate the fitted model.

The first set is used to find the model, while the second subset is used for predictions and comparison with expected values. Although, when you have a small dataset, this procedure shouldn't be used.

The function we will be using has one main configuration parameter - the percentage (from 0 to 1) of the data that is used for training or testing. For example, a training set of size 0.8 (80%) means that the remaining percentage of 0.2 (20%) goes to the test set. There is no optimal rule for the split percentage, it depends on goals, computational costs, set representativeness, and other factors, but it’s good to split data 70-30 (70% of data for training and 30% - for testing).

We will work in this section with train_test_split() function. It takes the dataset (x and y), the size of the test/train data, and returns it as output 2 subsets:


              123
            
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3)
print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

wine.target is just an attribute name for the load_wine class that we imported from sklearn.datasets. This attribute gives the values of the dataset we are trying to predict.

The rows are randomly assigned to sets. This happens so that the datasets are representative samples (e.g., a random sample) of the original data set. When comparing algorithms, it is sometimes important that they fit and evaluate on the same subsets. To do this, it is desirable to fix the initial value for the pseudo-random number generator using the function parameter random_state for the above-described method.

Завдання

Проведіть, щоб почати кодувати

Try to split your wine dataset.

[Line #8] Load the wine dataset.
[Line #17] Set a target using method .target. In this case it’s flavanoids.
[Line #25] Split the data 60-40 (60% of the data is for training and 40% is for testing) and insert 2 as a random parameter.
[Line #28] Print the variable Y_train.

Рішення

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 3. Розділ 1

single

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат