Course Content
Introduction to Neural Networks
Introduction to Neural Networks
Backward Propagation
Warning
Backpropagation is the most confusing part of neural network training. At its core, it uses the gradient descent algorithm, which requires a good understanding of calculus.
Backpropagation Structure
We can split backpropagation algorithm into several steps:
Forward Propagation:
At this step we pass our inputs through the perceptron to store outputs (Neuron.output
) of every neuron. This part we have already implemented in previous chapter.
Error Computing:
In this phase, we determine the individual error for each neuron. This error indicates the difference between the neuron's output and the desired output.
For neurons in the output layer, this is straightforward: when given a specific input, the error represents the difference between the neural network's prediction and the actual target value.
For neurons in the hidden layers, the error measures the variation between their current output and the expected input for the subsequent layer.
Calculating the Gradient (Delta):
At this stage, we calculate the degree and direction of each neuron's deviation. We achieve this by multiplying the neuron's error with the derivative of its activation function (in this case, sigmoid) based on its output.
This computation should be executed concurrently with error calculation, as the current layer's gradient (delta) is essential for determining the error in the preceding layer. It also causes this process to be done in order from output layer to input layer (backward direction).
Modifying Weights and Biases (Taking a Step in Gradient Descent):
The last step of the backpropagation process involves updating the neurons' weights and biases according to their respective deltas.
Note
Error computing and calculating the gradient should progress in reverse order, moving from the output layer towards the input layer.
Learning Rate
Another crucial aspect of model training is the learning rate. As an integral component of the gradient descent algorithm, the learning rate can be visualized as the pace of training.
A higher learning rate accelerates the training process; however, an excessively high rate might cause the neural network to overlook valuable insights and patterns within the data.
Note
The learning rate is a floating point value between 0 and 1 and its used on the last step of the backpropagation algorithm to reduce the adjustments applied to the weights and biases. Selecting an optimal learning rate involves various methods known as hyperparameter tuning.
Epochs
Every time our perceptron processes the entire dataset, we refer to it as an epoch. To effectively recognize patterns in the data, it's essential to feed our entire dataset into the model multiple times.
We can utilize the XOR example as a validation test to ensure our model is set up correctly. The XOR has only four unique combinations, all of which are derived from the truth table discussed in the preceding chapter.
By training our neural network using these examples over 10,000 epochs and a learning rate of 0.2, we ensure the model comprehends the data.
Swipe to begin your solution
Implement the backpropagation algorithm:
- Run forward propagation.
- Calculate errors of the neurons.
- Calculate delta of the neurons.
- Apply learning rate when computing the biases.
Note
There are several missing places in the code for tasks 2-4.
Solution
Thanks for your feedback!
Backward Propagation
Warning
Backpropagation is the most confusing part of neural network training. At its core, it uses the gradient descent algorithm, which requires a good understanding of calculus.
Backpropagation Structure
We can split backpropagation algorithm into several steps:
Forward Propagation:
At this step we pass our inputs through the perceptron to store outputs (Neuron.output
) of every neuron. This part we have already implemented in previous chapter.
Error Computing:
In this phase, we determine the individual error for each neuron. This error indicates the difference between the neuron's output and the desired output.
For neurons in the output layer, this is straightforward: when given a specific input, the error represents the difference between the neural network's prediction and the actual target value.
For neurons in the hidden layers, the error measures the variation between their current output and the expected input for the subsequent layer.
Calculating the Gradient (Delta):
At this stage, we calculate the degree and direction of each neuron's deviation. We achieve this by multiplying the neuron's error with the derivative of its activation function (in this case, sigmoid) based on its output.
This computation should be executed concurrently with error calculation, as the current layer's gradient (delta) is essential for determining the error in the preceding layer. It also causes this process to be done in order from output layer to input layer (backward direction).
Modifying Weights and Biases (Taking a Step in Gradient Descent):
The last step of the backpropagation process involves updating the neurons' weights and biases according to their respective deltas.
Note
Error computing and calculating the gradient should progress in reverse order, moving from the output layer towards the input layer.
Learning Rate
Another crucial aspect of model training is the learning rate. As an integral component of the gradient descent algorithm, the learning rate can be visualized as the pace of training.
A higher learning rate accelerates the training process; however, an excessively high rate might cause the neural network to overlook valuable insights and patterns within the data.
Note
The learning rate is a floating point value between 0 and 1 and its used on the last step of the backpropagation algorithm to reduce the adjustments applied to the weights and biases. Selecting an optimal learning rate involves various methods known as hyperparameter tuning.
Epochs
Every time our perceptron processes the entire dataset, we refer to it as an epoch. To effectively recognize patterns in the data, it's essential to feed our entire dataset into the model multiple times.
We can utilize the XOR example as a validation test to ensure our model is set up correctly. The XOR has only four unique combinations, all of which are derived from the truth table discussed in the preceding chapter.
By training our neural network using these examples over 10,000 epochs and a learning rate of 0.2, we ensure the model comprehends the data.
Swipe to begin your solution
Implement the backpropagation algorithm:
- Run forward propagation.
- Calculate errors of the neurons.
- Calculate delta of the neurons.
- Apply learning rate when computing the biases.
Note
There are several missing places in the code for tasks 2-4.
Solution
Thanks for your feedback!