Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Backpropagation Implementation | Neural Network from Scratch
Introduction to Neural Networks
course content

Course Content

Introduction to Neural Networks

Introduction to Neural Networks

1. Concept of Neural Network
2. Neural Network from Scratch
3. Conclusion

book
Backpropagation Implementation

General Approach

In forward propagation, each layer ll takes the outputs from the previous layer, alβˆ’1a^{l-1}, as inputs and computes its own outputs. Therefore, the forward() method of the Layer class takes the vector of previous outputs as its only parameter, while the rest of the needed information is stored within the class.

In backward propagation, each layer ll only needs dalda^l to compute the respective gradients and return dalβˆ’1da^{l-1}, so the backward() method takes the dalda^l vector as its parameter. The rest of the required information is already stored in the Layer class.

Activation Functions Derivatives

Since derivatives of activation functions are needed for backpropagation, activation functions like ReLU and sigmoid should be structured as classes instead of standalone functions. This allows us to define both:

  1. The activation function itself (implemented via the __call__() method), allowing it to be applied directly in the Layer class using self.activation(z);

  2. Its derivative (implemented via the derivative() method), enabling efficient backpropagation and used in the Layer class as self.activation.derivative(z).

By structuring activation functions as objects, we can easily pass them to the Layer class and use them dynamically.

ReLu

The derivative of ReLU activation function is as follows, where ziz_i is an element of vector of pre-activations zz:

fβ€²(zi)={1,zi>00,zi≀0f'(z_i) = \begin{cases} 1, z_i > 0\\ 0, z_i \le 0 \end{cases}
python

Sigmoid

The derivative of sigmoid activation function is as follows:

fβ€²(zi)=f(zi)β‹…(1βˆ’f(zi))f'(z_i) = f(z_i) \cdot (1 - f(z_i))
python

For both of these activation functions, we apply them to the entire vector zz, and the same goes for their derivatives. NumPy internally applies the operation to each element of the vector. For example, if the vector zz contains 3 elements, the derivation is as follows:

fβ€²(z)=fβ€²([z1z2z3])=[fβ€²(z1)fβ€²(z2)fβ€²(z3)]f'(z) = f'( \begin{bmatrix} z_1\\ z_2\\ z_3 \end{bmatrix} ) = \begin{bmatrix} f'(z_1)\\ f'(z_2)\\ f'(z_3) \end{bmatrix}

The backward() Method

The backward() method is responsible for computing the gradients using the formulas below:

dzl=dalβŠ™fβ€²l(zl)dWl=dzlβ‹…(alβˆ’1)Tdbl=dzldalβˆ’1=(Wl)Tβ‹…dzl\begin{aligned} dz^l &= da^l \odot f'^l(z^l)\\ dW^l &= dz^l \cdot (a^{l-1})^T\\ db^l &= dz^l\\ da^{l-1} &= (W^l)^T \cdot dz^l \end{aligned}

a^{l-1} and zlz^l are stored as the inputs and outputs attributes in the Layer class, respectively. The activation function ff is stores as the activation attribute.

Once all the required gradients are computed, the weights and biases can be updated since they are no longer needed for further computation:

Wl=Wlβˆ’Ξ±β‹…dWlbl=blβˆ’Ξ±β‹…dbl\begin{aligned} W^l &= W^l - \alpha \cdot dW^l\\ b^l &= b^l - \alpha \cdot db^l \end{aligned}

Therefore, learning_rate (Ξ±\alpha) is another parameter of this method.

python
Note
Note

The * operator performs element-wise multiplication, while the np.dot() function performs dot product in NumPy. The .T attribute transposes an array.

question mark

Which of the following best describes the role of the backward() method in the Layer class during backpropagation?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 8

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

course content

Course Content

Introduction to Neural Networks

Introduction to Neural Networks

1. Concept of Neural Network
2. Neural Network from Scratch
3. Conclusion

book
Backpropagation Implementation

General Approach

In forward propagation, each layer ll takes the outputs from the previous layer, alβˆ’1a^{l-1}, as inputs and computes its own outputs. Therefore, the forward() method of the Layer class takes the vector of previous outputs as its only parameter, while the rest of the needed information is stored within the class.

In backward propagation, each layer ll only needs dalda^l to compute the respective gradients and return dalβˆ’1da^{l-1}, so the backward() method takes the dalda^l vector as its parameter. The rest of the required information is already stored in the Layer class.

Activation Functions Derivatives

Since derivatives of activation functions are needed for backpropagation, activation functions like ReLU and sigmoid should be structured as classes instead of standalone functions. This allows us to define both:

  1. The activation function itself (implemented via the __call__() method), allowing it to be applied directly in the Layer class using self.activation(z);

  2. Its derivative (implemented via the derivative() method), enabling efficient backpropagation and used in the Layer class as self.activation.derivative(z).

By structuring activation functions as objects, we can easily pass them to the Layer class and use them dynamically.

ReLu

The derivative of ReLU activation function is as follows, where ziz_i is an element of vector of pre-activations zz:

fβ€²(zi)={1,zi>00,zi≀0f'(z_i) = \begin{cases} 1, z_i > 0\\ 0, z_i \le 0 \end{cases}
python

Sigmoid

The derivative of sigmoid activation function is as follows:

fβ€²(zi)=f(zi)β‹…(1βˆ’f(zi))f'(z_i) = f(z_i) \cdot (1 - f(z_i))
python

For both of these activation functions, we apply them to the entire vector zz, and the same goes for their derivatives. NumPy internally applies the operation to each element of the vector. For example, if the vector zz contains 3 elements, the derivation is as follows:

fβ€²(z)=fβ€²([z1z2z3])=[fβ€²(z1)fβ€²(z2)fβ€²(z3)]f'(z) = f'( \begin{bmatrix} z_1\\ z_2\\ z_3 \end{bmatrix} ) = \begin{bmatrix} f'(z_1)\\ f'(z_2)\\ f'(z_3) \end{bmatrix}

The backward() Method

The backward() method is responsible for computing the gradients using the formulas below:

dzl=dalβŠ™fβ€²l(zl)dWl=dzlβ‹…(alβˆ’1)Tdbl=dzldalβˆ’1=(Wl)Tβ‹…dzl\begin{aligned} dz^l &= da^l \odot f'^l(z^l)\\ dW^l &= dz^l \cdot (a^{l-1})^T\\ db^l &= dz^l\\ da^{l-1} &= (W^l)^T \cdot dz^l \end{aligned}

a^{l-1} and zlz^l are stored as the inputs and outputs attributes in the Layer class, respectively. The activation function ff is stores as the activation attribute.

Once all the required gradients are computed, the weights and biases can be updated since they are no longer needed for further computation:

Wl=Wlβˆ’Ξ±β‹…dWlbl=blβˆ’Ξ±β‹…dbl\begin{aligned} W^l &= W^l - \alpha \cdot dW^l\\ b^l &= b^l - \alpha \cdot db^l \end{aligned}

Therefore, learning_rate (Ξ±\alpha) is another parameter of this method.

python
Note
Note

The * operator performs element-wise multiplication, while the np.dot() function performs dot product in NumPy. The .T attribute transposes an array.

question mark

Which of the following best describes the role of the backward() method in the Layer class during backpropagation?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 8
some-alt