Course Content

Generative AI

1. Introduction to Generative AI

What is Generative AI?History and Evolution Types of Generative AI Models

2. Theoretical Foundations

Probability Distributions and Randomness in AI Bayesian Inference and Markov Processes Understanding Information and Optimization in AI Overview of Artificial Neural Networks Recurrent Neural Networks (RNNs) and Sequence Generation Variational Autoencoders (VAEs)Generative Adversarial Networks (GANs)Transformer-Based Generative Models Diffusion Models and Probabilistic Generative Approaches

3. Building and Training Generative Models

Data Collection and Preprocessing Training and Optimization Evaluation Metrics for Generative AI Challenge: Build Simple VAE

4. Ethical, Regulatory, and Future Perspectives in Generative AI

Bias, Fairness, and Representation Deepfakes and Misinformation Intellectual Property and Ownership Sustainability and Scaling Challenges Global Policy and AI Governance

Recurrent Neural Networks (RNNs) and Sequence Generation

Introduction to Recurrent Neural Networks (RNNs)

Recurrent Neural Networks (RNNs) are a class of neural networks specifically designed for processing sequential data. Unlike traditional feedforward networks, RNNs have connections that allow information to persist across time steps, making them particularly useful for tasks where past information influences future predictions, such as language modeling, speech recognition, and sequence generation.

How RNNs Work

An RNN processes sequences one step at a time, maintaining a hidden state that captures information from previous inputs. At each time step:

The network takes in the current input and the previous hidden state.
It computes a new hidden state using a weighted transformation followed by a non-linear activation function.
The hidden state is then used as input for the next time step and can also be used to generate an output.

Mathematically, an RNN is defined as:

h_t=\sigma\left( W_hh_{t-1}+W_xx_t+b \right)

where:

$h_t$ is the hidden state at time $t$ ;
$x_t$ is the input at time $t$ ;
$W_h$ and $W_x$ are weight matrices;
$b$ is a bias term;
$\sigma$ is a non-linear activation function (often tanh or ReLU).

RNNs capture dependencies within sequential data, but they suffer from issues such as the vanishing gradient problem, which limits their ability to learn long-range dependencies.

Variants of RNNs: LSTMs and GRUs

Standard RNNs struggle with long-term dependencies due to the vanishing gradient problem. To address this, more advanced architectures like Long Short-Term Memory (LSTMs) and Gated Recurrent Units (GRUs) were introduced.

Long Short-Term Memory (LSTMs)

LSTMs introduce memory cells and gating mechanisms to control the flow of information:

Forget Gate: determines which past information to discard;
Input Gate: decides what new information to store in memory;
Output Gate: Controls what information is sent as output.

LSTM equations:

f_t=\sigma(W_f x_t + U_f h_{t-1} + b_f)

i_t = \sigma (W_i x_t + U_i h_{t-1} + b_j)

o_t = \sigma (W_o x_t + U_o h_{t-1} + b_o)

c_t = f_t \odot c_{t-1} + i_t \odot \tanh (W_c x_t + U_c h_{t-1} + b_c)

where:

$f_t$ , $i_t$ and $o_t$ are forget, input, and output gates, respectively;
$c_t$ is the memory cell that retains long-term information;
$\sigma$ represents the sigmoid function, which outputs values between 0 and 1, allowing selective information flow;
$\tanh$ is the hyperbolic tangent function, which keeps values between -1 and 1 to normalize the update.

LSTMs effectively preserve long-term dependencies, making them highly effective for sequential tasks such as speech recognition and text generation.

Gated Recurrent Units (GRUs)

GRUs simplify LSTMs by reducing the number of gates while still maintaining strong performance. They use:

Update Gate: controls how much of the past information should be retained;
Reset Gate: determines how much of the past information should be ignored.

GRU equations:

z_t = \sigma (W_z x_t + U_z h_{t-1} + b_z)

r_t = \sigma(W_r x_t + U_r h_{t-1} + b_r)

h_t = (1 - z_t) \odot h_{t-1} + z_t \odot \tanh(W_h x_t + U_c (r_t \odot h_{t-1}) + b_h)

where:

$z_t$ (update gate) balances the old hidden state and the new information;
$r_t$ (reset gate) helps discard irrelevant past information;
$h_t$ is the updated hidden state at time $t$ ;
$W$ and $U$ are weight matrices, and $b$ is the bias term;
$\odot$ represents element-wise multiplication.

GRUs require fewer parameters than LSTMs and are computationally efficient while still handling long-term dependencies effectively.

Sequence Generation with RNNs

RNNs are widely used in sequence generation, where the network predicts the next item in a sequence based on previous context. Common examples include:

Text generation: predicting the next word in a sentence;
Music composition: Generating melodies based on a given style;
Image captioning: generating descriptive text for images.

Example: Text Generation with RNNs

Train an RNN on a large text dataset;
Provide an initial word or phrase as input;
The RNN predicts the next word based on prior context;
The predicted word is fed back into the network for the next prediction;
Repeat this process to generate a coherent sequence;

This technique powers applications such as chatbots, AI-powered storytelling, and autocomplete systems.

Applications of RNNs in Generative AI

RNNs are utilized in various generative AI applications:

Machine Translation: used in early models of Google Translate;
Speech Recognition: converts spoken language into text (e.g., Siri, Google Assistant);
AI-Based Content Generation: early versions of generative AI models before transformers;
Music and Poetry Generation: AI models like OpenAI’s MuseNet generate compositions in different styles.

Conclusion

RNNs are essential for handling sequential data, but they struggle with long-term dependencies due to the vanishing gradient problem. LSTMs and GRUs mitigate this issue, making RNNs powerful for generative applications in text, speech, and music. However, modern architectures like Transformers have largely replaced RNNs in state-of-the-art generative AI models due to their ability to capture long-range dependencies more efficiently.

1. How does an RNN differ from a feedforward neural network?

2. Why are LSTMs and GRUs preferred over standard RNNs for long sequences?

3. Which of the following is NOT a common application of RNNs?

How does an RNN differ from a feedforward neural network?

Select the correct answer

RNNs use convolutional layers

RNNs have connections that allow information to persist across time steps

RNNs do not require activation functions

RNNs are only used for image recognition

Why are LSTMs and GRUs preferred over standard RNNs for long sequences?

Select the correct answer

They use convolutional layers

They can process only short sequences

They solve the vanishing gradient problem

They use more activation functions

Which of the following is NOT a common application of RNNs?

Select the correct answer

Speech recognition

Machine translation

Image classification

Text generation

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 5

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Generative AI

1. Introduction to Generative AI

What is Generative AI?History and Evolution Types of Generative AI Models

2. Theoretical Foundations

3. Building and Training Generative Models

Data Collection and Preprocessing Training and Optimization Evaluation Metrics for Generative AI Challenge: Build Simple VAE

4. Ethical, Regulatory, and Future Perspectives in Generative AI

Bias, Fairness, and Representation Deepfakes and Misinformation Intellectual Property and Ownership Sustainability and Scaling Challenges Global Policy and AI Governance

Recurrent Neural Networks (RNNs) and Sequence Generation

Introduction to Recurrent Neural Networks (RNNs)

How RNNs Work

An RNN processes sequences one step at a time, maintaining a hidden state that captures information from previous inputs. At each time step:

The network takes in the current input and the previous hidden state.
It computes a new hidden state using a weighted transformation followed by a non-linear activation function.
The hidden state is then used as input for the next time step and can also be used to generate an output.

Mathematically, an RNN is defined as:

h_t=\sigma\left( W_hh_{t-1}+W_xx_t+b \right)

where:

$h_t$ is the hidden state at time $t$ ;
$x_t$ is the input at time $t$ ;
$W_h$ and $W_x$ are weight matrices;
$b$ is a bias term;
$\sigma$ is a non-linear activation function (often tanh or ReLU).

RNNs capture dependencies within sequential data, but they suffer from issues such as the vanishing gradient problem, which limits their ability to learn long-range dependencies.

Variants of RNNs: LSTMs and GRUs

Long Short-Term Memory (LSTMs)

LSTMs introduce memory cells and gating mechanisms to control the flow of information:

Forget Gate: determines which past information to discard;
Input Gate: decides what new information to store in memory;
Output Gate: Controls what information is sent as output.

LSTM equations:

f_t=\sigma(W_f x_t + U_f h_{t-1} + b_f)

i_t = \sigma (W_i x_t + U_i h_{t-1} + b_j)

o_t = \sigma (W_o x_t + U_o h_{t-1} + b_o)

c_t = f_t \odot c_{t-1} + i_t \odot \tanh (W_c x_t + U_c h_{t-1} + b_c)

where:

$f_t$ , $i_t$ and $o_t$ are forget, input, and output gates, respectively;
$c_t$ is the memory cell that retains long-term information;
$\sigma$ represents the sigmoid function, which outputs values between 0 and 1, allowing selective information flow;
$\tanh$ is the hyperbolic tangent function, which keeps values between -1 and 1 to normalize the update.

LSTMs effectively preserve long-term dependencies, making them highly effective for sequential tasks such as speech recognition and text generation.

Gated Recurrent Units (GRUs)

GRUs simplify LSTMs by reducing the number of gates while still maintaining strong performance. They use:

Update Gate: controls how much of the past information should be retained;
Reset Gate: determines how much of the past information should be ignored.

GRU equations:

z_t = \sigma (W_z x_t + U_z h_{t-1} + b_z)

r_t = \sigma(W_r x_t + U_r h_{t-1} + b_r)

h_t = (1 - z_t) \odot h_{t-1} + z_t \odot \tanh(W_h x_t + U_c (r_t \odot h_{t-1}) + b_h)

where:

$z_t$ (update gate) balances the old hidden state and the new information;
$r_t$ (reset gate) helps discard irrelevant past information;
$h_t$ is the updated hidden state at time $t$ ;
$W$ and $U$ are weight matrices, and $b$ is the bias term;
$\odot$ represents element-wise multiplication.

GRUs require fewer parameters than LSTMs and are computationally efficient while still handling long-term dependencies effectively.

Sequence Generation with RNNs

RNNs are widely used in sequence generation, where the network predicts the next item in a sequence based on previous context. Common examples include:

Text generation: predicting the next word in a sentence;
Music composition: Generating melodies based on a given style;
Image captioning: generating descriptive text for images.

Example: Text Generation with RNNs

Train an RNN on a large text dataset;
Provide an initial word or phrase as input;
The RNN predicts the next word based on prior context;
The predicted word is fed back into the network for the next prediction;
Repeat this process to generate a coherent sequence;

This technique powers applications such as chatbots, AI-powered storytelling, and autocomplete systems.

Applications of RNNs in Generative AI

RNNs are utilized in various generative AI applications:

Machine Translation: used in early models of Google Translate;
Speech Recognition: converts spoken language into text (e.g., Siri, Google Assistant);
AI-Based Content Generation: early versions of generative AI models before transformers;
Music and Poetry Generation: AI models like OpenAI’s MuseNet generate compositions in different styles.

Conclusion

1. How does an RNN differ from a feedforward neural network?

2. Why are LSTMs and GRUs preferred over standard RNNs for long sequences?

3. Which of the following is NOT a common application of RNNs?

How does an RNN differ from a feedforward neural network?

Select the correct answer

RNNs use convolutional layers

RNNs have connections that allow information to persist across time steps

RNNs do not require activation functions

RNNs are only used for image recognition

Why are LSTMs and GRUs preferred over standard RNNs for long sequences?

Select the correct answer

They use convolutional layers

They can process only short sequences

They solve the vanishing gradient problem

They use more activation functions

Which of the following is NOT a common application of RNNs?

Select the correct answer

Speech recognition

Machine translation

Image classification

Text generation

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 5