Course Content
Introduction to Neural Networks
Introduction to Neural Networks
Perceptron Layers
Perceptron is the name of the simplest neural network, consisting of only one neuron. However, in order to be able to solve more complex problems, we will create a model called multilayer perceptron (MLP). A multilayer perceptron consists of one or more hidden layers. The structure of a multilayer perceptron looks like this:
An Input Layer: it receives the input data;
Hidden layers: these layers process the data and extract patterns.
Output layer: produces the final prediction or classifications.
In general, each layer consists of multiple neurons, and the output from one layer becomes the input for the next layer.
Layer Weights and Biases
Before implementing a layer, it is important to understand how to store the weights and biases of each neuron within it. In the previous chapter, you learned how to store the weights of a single neuron as a vector and its bias as a scalar (single number).
Since a layer consists of multiple neurons, it is natural to represent the weights as a matrix, where each row corresponds to the weights of a specific neuron. Consequently, biases can be represented as a vector, whose length is equal to the number of neurons.
Given a layer with inputs and neurons, its weights will be stored in a matrix and its biases will be stored in a vector , which look as follows:
Here, element represents the weight of a -th input to the -th neuron, so the first row contains the weights of the first neuron, and the second row contains the weights of the second neuron. Element represents the bias of the -th neuron (two neurons β two biases).
Forward Propagation
Performing forward propagation for each layer means activating each of its neurons by computing the weighted sum of the inputs, adding the bias, and applying the activation function.
Previously, for a single neuron, you implemented weighted sum of the inputs by computing a dot product between the input vector and the weight vector and adding the bias.
Since each row of the weight matrix contains the weight vector for a particular neuron, all you have to do now is simply perform a dot product between each row of the matrix and the input vector. Luckily, this is exactly what matrix multiplication does:
To add the biases to the outputs of the respective neurons, a vector of biases should be added as well:
Finally, the activation function is applied to the result β sigmoid or ReLU, in our case. The resulting formula for forward propagation in the layer is as follows:
where is the vector of neuron activations (outputs).
Layer Class
The perceptron's fundamental building blocks are its layers, therefore, it makes sense to create a separate Layer
class. Its attributes include:
inputs
: a vector of inputs (n_inputs
is the number of inputs);outputs
: a vector of raw output values (before applying the activation function) of the neurons (n_neurons
is the number of neurons);weights
: a weight matrix;biases
: a bias vector;activation_function
: the activation function used in the layer.
Like in the single neuron implementation, weights
and biases
will be initialized with random values between -1 and 1 drawn from a uniform distribution.
python
The inputs
and outputs
attributes will be used later in backpropagation, so it makes sense to initialize it as NumPy arrays of zeros.
Initializing inputs
and outputs
as zero-filled NumPy arrays prevents errors when performing calculations in forward and backward propagation. It also ensures consistency across layers, allowing smooth matrix operations without requiring additional checks.
Forward propagation can be implemented in the forward()
method, where outputs
are computed based on the inputs
vector using NumPy, following the formula above:
Reshaping inputs
into a column vector ensures correct matrix multiplication with the weight matrix during forward propagation. This prevents shape mismatches and allows seamless computations across all layers.
1. What makes a multilayer perceptron (MLP) more powerful than a simple perceptron?
2. Why do we apply this code before multiplying inputs
by the weight matrix?
Thanks for your feedback!