Course Content
Logistic Regression Mastering
Logistic Regression Mastering
 Train and Test Split
Train and Test Split
The train-test split procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model.
It is a fast and easy procedure to perform, the results of which allow you to compare the performance of machine learning algorithms for your predictive modeling problem.
Methods description
- 
sklearn: This module provides simple and efficient tools for data mining and data analysis. It includes various algorithms and utilities for machine learning tasks;
- 
model_selection: This submodule within sklearn provides tools for model selection and evaluation, including methods for splitting data into training and testing sets;
- 
.train_test_split(): This function splits arrays or matrices into random train and test subsets. It takes in arraysXandyrepresenting features and target variables, respectively. Thetest_sizeparameter determines the proportion of the dataset to include in the test split. Therandom_stateparameter sets the seed used for random sampling to ensure reproducibility. It returns four arrays:X_train,X_test,y_train, andy_test, representing the training and testing sets for features and target variables, respectively.
Swipe to start coding
- 
Import train_test_splitfromsklearn.
- 
Define X as all the features (exclude "target").
- 
Define y as the "target"variable.
- 
Split the training and the test set with a size of 67% (train) and 33% (test). 
Solution
Thanks for your feedback!