How to Generate Sinusoidal Positional Encoding
Swipe to show menu
Sinusoidal positional encoding lets the transformer model sense word order and position, even though it does not use recurrence or sequence-aware layers. Each position is represented by a distinct pattern of sine and cosine values spread across the embedding dimensions.
Let's take a look into the code below.
1234567891011121314151617import numpy as np def get_sinusoidal_positional_encoding(seq_length, embed_dim): position = np.arange(seq_length)[:, np.newaxis] div_term = np.exp( np.arange(0, embed_dim, 2) * -(np.log(10000.0) / embed_dim) ) pe = np.zeros((seq_length, embed_dim)) pe[:, 0::2] = np.sin(position * div_term) pe[:, 1::2] = np.cos(position * div_term) return pe # Example usage: seq_length = 6 embed_dim = 8 encoding = get_sinusoidal_positional_encoding(seq_length, embed_dim) print(encoding)
The code for generating sinusoidal positional encoding can be understood step by step:
1. Create the position array
position = np.arange(seq_length)[:, np.newaxis]
- This creates a column vector where each row represents a position in your input sequence, starting from 0.
- If your sequence has six tokens, this array will look like
[0, 1, 2, 3, 4, 5]as a column.
2. Calculate the frequency scaling term
div_term = np.exp(
np.arange(0, embed_dim, 2) * -(np.log(10000.0) / embed_dim)
)
- This calculates a scaling factor for each even embedding dimension.
- The scaling ensures that each dimension has a different frequency, letting the encoding capture both short- and long-range position patterns.
- The use of
10000.0spreads out the frequencies, so changes in position affect each dimension differently.
3. Initialize the positional encoding matrix
pe = np.zeros((seq_length, embed_dim))
- This creates a matrix filled with zeros, with one row for each position and one column for each embedding dimension.
4. Fill the matrix with sine and cosine values
pe[:, 0::2] = np.sin(position * div_term)
pe[:, 1::2] = np.cos(position * div_term)
- For even columns, fill with the sine of
position * div_term. - For odd columns, fill with the cosine of
position * div_term. - This alternation means every position gets a unique combination of values, and the pattern changes smoothly across positions and dimensions.
5. Return the positional encoding
return pe
- The resulting matrix gives you a unique encoding for each position in your sequence.
- This encoding can be added to your word embeddings so the transformer model knows the order of the tokens.
Everything was clear?
Thanks for your feedback!
Section 1. Chapter 7
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Section 1. Chapter 7