Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn How to Generate Sinusoidal Positional Encoding | Understanding Transformer Foundations
Transformers for Natural Language Processing

bookHow to Generate Sinusoidal Positional Encoding

Swipe to show menu

Sinusoidal positional encoding lets the transformer model sense word order and position, even though it does not use recurrence or sequence-aware layers. Each position is represented by a distinct pattern of sine and cosine values spread across the embedding dimensions.

Let's take a look into the code below.

1234567891011121314151617
import numpy as np def get_sinusoidal_positional_encoding(seq_length, embed_dim): position = np.arange(seq_length)[:, np.newaxis] div_term = np.exp( np.arange(0, embed_dim, 2) * -(np.log(10000.0) / embed_dim) ) pe = np.zeros((seq_length, embed_dim)) pe[:, 0::2] = np.sin(position * div_term) pe[:, 1::2] = np.cos(position * div_term) return pe # Example usage: seq_length = 6 embed_dim = 8 encoding = get_sinusoidal_positional_encoding(seq_length, embed_dim) print(encoding)
copy

The code for generating sinusoidal positional encoding can be understood step by step:

1. Create the position array

position = np.arange(seq_length)[:, np.newaxis]
  • This creates a column vector where each row represents a position in your input sequence, starting from 0.
  • If your sequence has six tokens, this array will look like [0, 1, 2, 3, 4, 5] as a column.

2. Calculate the frequency scaling term

div_term = np.exp(
    np.arange(0, embed_dim, 2) * -(np.log(10000.0) / embed_dim)
)
  • This calculates a scaling factor for each even embedding dimension.
  • The scaling ensures that each dimension has a different frequency, letting the encoding capture both short- and long-range position patterns.
  • The use of 10000.0 spreads out the frequencies, so changes in position affect each dimension differently.

3. Initialize the positional encoding matrix

pe = np.zeros((seq_length, embed_dim))
  • This creates a matrix filled with zeros, with one row for each position and one column for each embedding dimension.

4. Fill the matrix with sine and cosine values

pe[:, 0::2] = np.sin(position * div_term)
pe[:, 1::2] = np.cos(position * div_term)
  • For even columns, fill with the sine of position * div_term.
  • For odd columns, fill with the cosine of position * div_term.
  • This alternation means every position gets a unique combination of values, and the pattern changes smoothly across positions and dimensions.

5. Return the positional encoding

return pe
  • The resulting matrix gives you a unique encoding for each position in your sequence.
  • This encoding can be added to your word embeddings so the transformer model knows the order of the tokens.
question mark

Which of the following statements about sinusoidal positional encoding are true?

Select all correct answers

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 7

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 7
some-alt