Data Preprocessing

Wine Dataset

Now we will try to train our model on more realistic task. There is a wine dataset in scikit-learn library that we will use to predict wine class. We will use 3 input parameters for prediction.

Here you can see how this dataset look like:


              12345678910
            
import pandas as pd  # Import pandas to create a DataFrame from loaded dataset
from sklearn.datasets import load_wine  # Import dataset loading function

wine_ds = load_wine()  # Load the dataset
X = pd.DataFrame(wine_ds.data, columns=wine_ds.feature_names)[['flavanoids', 'proline', 'total_phenols']]  # Extract input values from the dataset
y = pd.DataFrame(wine_ds.target, columns=['target'])  # Extract output values from the dataset

# Display the datasets
display(X.head())  # `X` is our input values, they are used to predict target value 
display(pd.DataFrame(y.value_counts()))  # `y` is a target value, that we want to predict; it has 3 target classes

To train our model, we'll use three input parameters: flavanoids, proline, and total_phenols. For now, we have chosen these parameters as one of those with the highest correlation. This is done in order to reduce the size of the neural network required for successful training and reduce the time spent on the training process.

Data Preprocessing

Here's how we'll prepare the data for training:

Data Scaling: Neural networks differ from decision trees or random forests in that they require data scaling for better performance. This step is crucial for reasons such as ensuring numerical stability, achieving faster convergence, and ensuring unit independence, etc. Always scale your data before passing it trough a neural network;
One-Hot Encoding: Our target values comprise three classes, represented in a single column by the numbers 0, 1, and 2. For enhanced neural network performance, it's more effective to encode these classes into three distinct columns;
Train-Test Data Split: Using the same dataset for both training and testing won't give us a realistic measure of the model's performance on new, unseen data.

Task

Swipe to begin your solution

Prepare the wine dataset to work with our neural network:

Extract input values from the dataset.
Scale input values.
Split data into train and test sets (40% of data will be used as test data).

Solution

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 5

Data Preprocessing

Wine Dataset

Here you can see how this dataset look like:


              12345678910
            
import pandas as pd  # Import pandas to create a DataFrame from loaded dataset
from sklearn.datasets import load_wine  # Import dataset loading function

wine_ds = load_wine()  # Load the dataset
X = pd.DataFrame(wine_ds.data, columns=wine_ds.feature_names)[['flavanoids', 'proline', 'total_phenols']]  # Extract input values from the dataset
y = pd.DataFrame(wine_ds.target, columns=['target'])  # Extract output values from the dataset

# Display the datasets
display(X.head())  # `X` is our input values, they are used to predict target value 
display(pd.DataFrame(y.value_counts()))  # `y` is a target value, that we want to predict; it has 3 target classes

Data Preprocessing

Here's how we'll prepare the data for training:

Data Scaling: Neural networks differ from decision trees or random forests in that they require data scaling for better performance. This step is crucial for reasons such as ensuring numerical stability, achieving faster convergence, and ensuring unit independence, etc. Always scale your data before passing it trough a neural network;
One-Hot Encoding: Our target values comprise three classes, represented in a single column by the numbers 0, 1, and 2. For enhanced neural network performance, it's more effective to encode these classes into three distinct columns;
Train-Test Data Split: Using the same dataset for both training and testing won't give us a realistic measure of the model's performance on new, unseen data.

Task

Swipe to begin your solution

Prepare the wine dataset to work with our neural network:

Extract input values from the dataset.
Scale input values.
Split data into train and test sets (40% of data will be used as test data).

Solution

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 5

Switch to desktop for real-world practiceContinue from where you are using one of the options below