Course Content
Logistic Regression Mastering
Logistic Regression Mastering
Swipe to show menu
Train and Test Split
The train-test split procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model.
It is a fast and easy procedure to perform, the results of which allow you to compare the performance of machine learning algorithms for your predictive modeling problem.
Methods description
-
sklearn
: This module provides simple and efficient tools for data mining and data analysis. It includes various algorithms and utilities for machine learning tasks; -
model_selection
: This submodule within sklearn provides tools for model selection and evaluation, including methods for splitting data into training and testing sets; -
.train_test_split()
: This function splits arrays or matrices into random train and test subsets. It takes in arraysX
andy
representing features and target variables, respectively. Thetest_size
parameter determines the proportion of the dataset to include in the test split. Therandom_state
parameter sets the seed used for random sampling to ensure reproducibility. It returns four arrays:X_train
,X_test
,y_train
, andy_test
, representing the training and testing sets for features and target variables, respectively.
Swipe to show code editor
-
Import
train_test_split
fromsklearn
. -
Define X as all the features (exclude
"target"
). -
Define y as the
"target"
variable. -
Split the training and the test set with a size of 67% (train) and 33% (test).
Thanks for your feedback!