single
Challenge: Training the Perceptron
メニューを表示するにはスワイプしてください
Before proceeding with training the perceptron, keep in mind that it uses the binary cross-entropy loss function discussed earlier. The final key concept before implementing backpropagation is the formula for the derivative of this loss function with respect to the output activations, $a^n$. Below are the formulas for the loss function and its derivative:
Ldan=−(ylog(y^)+(1−y)log(1−y^))=y^(1−y^)y^−ywhere an=y^
To verify that the perceptron is training correctly, the fit() method also prints the average loss at each epoch. This is calculated by averaging the loss over all training examples in that epoch:
for epoch in range(epochs):
loss = 0
for i in range(training_data.shape[0]):
loss += -(target * np.log(output) + (1 - target) * np.log(1 - output))
average_loss = loss[0, 0] / training_data.shape[0]
print(f'Loss at epoch {epoch + 1}: {average_loss:.3f}')
L=−N1i=1∑N(yilog(y^i)+(1−yi)log(1−y^i))
Finally, the formulas for computing gradients in each layer are as follows:
dzldWldbldal−1=dal⊙f′l(zl)=dzl⋅(al−1)T=dzl=(Wl)T⋅dzlImplementation Details to Remember
When translating these formulas into Python code for the backward() method, remember the NumPy operations discussed in the previous chapters:
- The ⊙ operator denotes element-wise multiplication, which is done using the standard
*operator in Python. - The ⋅ operator denotes a dot product, implemented using the
np.dot()function. - The T superscript denotes a matrix transpose, handled by the
.Tattribute. - To compute f′l(zl), you can dynamically call the derivative of the layer's activation function using
self.activation.derivative(self.outputs).
This makes the general structure of the backward() method look like this:
def backward(self, da, learning_rate):
dz = ... # using da and self.activation.derivative()
d_weights = ... # using np.dot() and .T
d_biases = ...
da_prev = ...
self.weights -= learning_rate * d_weights
self.biases -= learning_rate * d_biases
return da_prev
Similarly, when putting everything together in the fit() method, remember that you need to iterate through the network backwards to propagate the error. The general structure looks like this:
def fit(self, training_data, labels, epochs, learning_rate):
# ... (Epoch loop and data shuffling) ...
# Forward propagation
output = ...
# Computing the gradient of the loss function w.r.t. output (da^n)
da = ...
# Backward propagation through all layers
for layer in self.layers[::-1]:
da = ... # Call the backward() method of the layer
The sample training data (X_train) along with the corresponding labels (y_train) are stored as NumPy arrays in the utils.py file. Additionally, instances of the activation functions are also defined there:
relu = ReLU()
sigmoid = Sigmoid()
スワイプしてコーディングを開始
Your goal is to complete the training process for a multilayer perceptron by implementing backpropagation and updating the model parameters.
Follow these steps carefully:
- Implement the
backward()method in theLayerclass:- Compute the following gradients:
dz: derivative of the loss with respect to the pre-activation values, using the derivative of the activation function;d_weights: gradient of the loss with respect to the weights, calculated as the dot product ofdzand the transposed input vector;d_biases: gradient of the loss with respect to the biases, equal todz;da_prev: gradient of the loss with respect to the activations of the previous layer, obtained by multiplying the transposed weight matrix bydz.
- Update the weights and biases using the learning rate.
- Compute the following gradients:
- Complete the
fit()method in thePerceptronclass:- Compute the model output by calling the
forward()method; - Calculate the loss using the cross-entropy formula;
- Compute dan — the derivative of the loss with respect to the output activations;
- Loop backward through the layers, performing backpropagation by calling each layer's
backward()method.
- Compute the model output by calling the
- Check the training behavior:
- If everything is implemented correctly, the loss should steadily decrease with each epoch when using a learning rate of
0.01.
- If everything is implemented correctly, the loss should steadily decrease with each epoch when using a learning rate of
解答
フィードバックありがとうございます!
single
AIに質問する
AIに質問する
何でも質問するか、提案された質問の1つを試してチャットを始めてください