Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Implementing Feed-Forward Networks | Building Transformer Components
Transformers for Natural Language Processing
Section 2. Chapter 3
single

single

bookChallenge: Implementing Feed-Forward Networks

Swipe to show menu

As you explore the Transformer architecture for natural language processing, you encounter a crucial component inside each Transformer block: the position-wise feed-forward network (FFN). After the self-attention mechanism processes input representations, the FFN further transforms these representations at each position in the sequence, independently of other positions. This means that for every token in a sentence, the same small neural network is applied, allowing the model to introduce additional non-linearity and learn more complex patterns from the text. The FFN is essential for capturing relationships and refining the information encoded by self-attention, especially when dealing with the subtleties and ambiguities of human language.

123456789101112131415161718192021222324252627282930313233
import numpy as np def relu(x): return np.maximum(0, x) class PositionWiseFeedForward: def __init__(self, d_model, d_ff): # Initialize weights and biases for two linear layers self.W1 = np.random.randn(d_model, d_ff) * 0.01 self.b1 = np.zeros((1, d_ff)) self.W2 = np.random.randn(d_ff, d_model) * 0.01 self.b2 = np.zeros((1, d_model)) def __call__(self, x): # x shape: (batch_size, seq_len, d_model) # Apply first linear layer and ReLU activation out1 = relu(np.matmul(x, self.W1) + self.b1) # Apply second linear layer out2 = np.matmul(out1, self.W2) + self.b2 return out2 # Example usage: batch_size = 2 seq_len = 4 d_model = 8 d_ff = 16 # Example input: random tensor simulating text representations x = np.random.randn(batch_size, seq_len, d_model) ffn = PositionWiseFeedForward(d_model, d_ff) output = ffn(x) print("Output shape:", output.shape)
copy

In the code above, you see a simple implementation of a position-wise feed-forward network using numpy. The network consists of two linear transformations (matrix multiplications), separated by a ReLU activation function.

Note
Definition

ReLU activation function: The ReLU (Rectified Linear Unit) activation is defined as relu(x) = max(0, x). It sets all negative values to zero and keeps positive values unchanged. ReLU is used in the feed-forward network to introduce non-linearity, allowing the network to learn more complex patterns from the data.

The first linear layer projects the input from d_model dimensions (the size of each token's embedding) to a higher-dimensional space d_ff, allowing the model to capture more complex features. The second linear layer projects the result back to the original d_model size. Notice that this network is applied independently to each position in the sequence, which means that the transformation for one token does not directly affect others. This independence helps the model process each token's representation in parallel, making Transformers highly efficient for text data.

Task

Swipe to start coding

Implement a position-wise feed-forward network function using numpy.

Define a function position_wise_ffn(x, W1, b1, W2, b2) that takes:

  • x: a numpy array of shape (batch_size, seq_len, d_model);
  • W1: a numpy array of shape (d_model, d_ff);
  • b1: a numpy array of shape (1, d_ff);
  • W2: a numpy array of shape (d_ff, d_model);
  • b2: a numpy array of shape (1, d_model).

For each position in the sequence, apply:

  • A linear transformation: out1 = x @ W1 + b1;
  • A ReLU activation: out1 = relu(out1);
  • A second linear transformation: out2 = out1 @ W2 + b2.

Return the output array out2 with shape (batch_size, seq_len, d_model).

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 3
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

some-alt