Learn Training of the Model | Neural Networks in PyTorch

Preparing for Training

First, you need to ensure that the model, loss function, and optimizer are properly defined. Let's go through each step:

Loss function: for classification, you can use CrossEntropyLoss, which expects raw continuous values (logits) as input and automatically applies softmax;
Optimizer: you can use the Adam optimizer for efficient gradient updates.

import torch.nn as nn
import torch.optim as optim
# Define the loss function (cross-entropy for multi-class classification)
criterion = nn.CrossEntropyLoss()
# Define the optimizer (Adam with a learning rate of 0.01)
optimizer = optim.Adam(model.parameters(), lr=0.01)

In PyTorch, cross-entropy loss combines log-softmax and negative log-likelihood (NLL) loss into a single loss function:

where:

z_y is the logit corresponding to the correct class;
C is the total number of classes.

It is also important to split the data into training and validation sets (ideally, a separate test set should also exist). Since the dataset is relatively small (1143 rows), we use an 80% to 20% split. In this case, the validation set will also serve as the test set.

Moreover, the resulting NumPy arrays should be converted to tensors, as PyTorch models require tensor inputs for computations.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test =  train_test_split(X, y, test_size=0.2, random_state=42)
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.long)
y_test = torch.tensor(y_test, dtype=torch.long)

Training Loop

The training loop involves the following steps for each epoch:

Forward pass: pass the input features through the model to generate predictions;
Loss calculation: compare the predictions with the ground truth using the loss function;
Backward pass: compute gradients with respect to the model parameters using backpropagation;
Parameter update: adjust model parameters using the optimizer;
Monitoring progress: print the loss periodically to observe convergence.

As you can see, the training process is similar to that of linear regression.


              12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
            
import torch.nn as nn
import torch
import torch.optim as optim
import matplotlib.pyplot as plt
import os
os.system('wget https://staging-content-media-cdn.codefinity.com/courses/1dd2b0f6-6ec0-40e6-a570-ed0ac2209666/section_3/model_definition.py 2>/dev/null')
from model_definition import model, X, y
from sklearn.model_selection import train_test_split

# Set manual seed for reproducibility
torch.manual_seed(42)
# Reinitialize model after setting seed
model.apply(lambda m: m.reset_parameters() if hasattr(m, "reset_parameters") else None)

X_train, X_test, y_train, y_test =  train_test_split(X, y, test_size=0.2, random_state=42)
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.long)
y_test = torch.tensor(y_test, dtype=torch.long)
# Define the loss function (Cross-Entropy for multi-class classification)
criterion = nn.CrossEntropyLoss()
# Define the optimizer (Adam with a learning rate of 0.01)
optimizer = optim.Adam(model.parameters(), lr=0.01)
# Number of epochs
epochs = 100
# Store losses for plotting
training_losses = []
# Training loop
for epoch in range(epochs):
    # Zero out gradients from the previous step
    optimizer.zero_grad()
    # Compute predictions
    predictions = model(X_train) 
    # Compute the loss
    loss = criterion(predictions, y_train) 
    # Compute gradients
    loss.backward()
    # Update parameters
    optimizer.step()
    # Store the loss
    training_losses.append(loss.item())

# Plot the training loss
plt.plot(range(epochs), training_losses, label="Training Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training Loss over Epochs")
plt.legend()
plt.show()

Observing Convergence

In addition to training the model, we also record the training loss at each epoch and plot it over time. As shown in the graph, the training loss initially decreases rapidly and then gradually stabilizes around epoch 60. Beyond this point, the loss decreases at a much slower rate, suggesting that the model has likely converged. Therefore, using around 40 epochs for this model would be sufficient.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 5

Swipe to show menu