Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Types of Generative AI Models | Introduction to Generative AI
Generative AI
course content

Kursinnehåll

Generative AI

Generative AI

1. Introduction to Generative AI
2. Theoretical Foundations
3. Building and Training Generative Models
4. Applications of Generative AI
5. Ethical and Societal Implications
6. Future Trends and Challenges

book
Types of Generative AI Models

Generative AI models are designed to create new content by learning patterns from existing data. These models have the capability to generate a wide range of outputs, including text, images, music, videos, and even 3D objects.

Generative AI models can be broadly classified into two categories:

  1. Rule-Based Models: these models rely on predefined rules and logic to generate content. They are often simpler and less flexible but can be effective for specific tasks;
  2. Deep Learning-Based Models: these models utilize neural networks to learn from vast amounts of data, enabling them to produce highly realistic and complex outputs. They are more adaptable and can handle a variety of creative tasks;

Modern Generative AI relies on deep learning-based models, which include:

  • Generative Adversarial Networks (GANs);
  • Variational Autoencoders (VAEs);
  • Recurrent Neural Networks (RNNs) & Long Short-Term Memory (LSTMs);
  • Diffusion Models;
  • Neural Radiance Fields (NeRFs).

Each model type has a unique architecture that influences how it generates content, making them suitable for different applications in the field of AI.

1. Generative Adversarial Networks (GANs)

GANs consist of two competing neural networks that train together:

  • Generator: creates synthetic data;
  • Discriminator: distinguishes real data from fake.

Architecture of GANs

  1. Input:

    • The Generator starts with a random noise vector (latent space);
  2. Generator Module:

    • Uses fully connected layers to map noise into structured features;
    • Applies convolutional layers to refine the output (e.g., generating an image);
  3. Generated Output:

    • The Generator produces synthetic data (e.g., an image);
  4. Discriminator Module:

    • Uses convolutional layers to analyze the image;
    • Applies a classification layer to determine if the image is real or fake.
  5. Adversarial Training

    • If the Discriminator correctly classifies the fake image, the Generator adjusts its parameters to improve;
    • This process repeats until the Generator produces highly realistic outputs.

Common Uses:

  • AI-generated images and deepfakes
  • Synthetic data generation
  • AI-driven artistic style transfer

2. Variational Autoencoders (VAEs)

VAEs are probabilistic models that learn a compressed representation of data and then reconstruct variations from it.

Architecture of VAEs

  1. Input Layer:
    • Accepts raw data (e.g., an image);
  2. Encoder Module:
    • Compresses the input into a latent space representation (smaller-dimensional feature space);
    • Uses convolutional or fully connected layers;
  3. Latent Space:
    • Defines the probability distribution of features using mean and variance layers;
    • Adds random noise to allow variations in generated outputs;
  4. Decoder Module:
    • Reconstructs data from the latent representation;
    • Uses deconvolutional layers (upsampling) to generate new data;
  5. Output Layer:
    • Produces reconstructed data (e.g., a modified version of the input).

Common Uses:

  • Data augmentation and synthetic data generation
  • Image generation with controlled variations
  • Anomaly detection

3. Transformer-Based Models

Transformers are the foundation of modern AI text models. Instead of processing data sequentially, they analyze entire input sequences at once using self-attention mechanisms.

Architecture of Transformers

  1. Input Embedding:
    • Converts words or tokens into vector representations;
    • Uses positional encoding to maintain word order;
  2. Self-Attention Module:
    • Determines which words in a sentence are important based on context;
    • Uses multi-head attention layers for deeper context understanding;
  3. Feedforward Network:
    • Processes self-attention outputs using fully connected layers;
    • Normalizes data with layer normalization;
  4. Output Layer:
    • Generates next-word predictions or translates text based on learned patterns.

Common Uses:

  • AI-powered chatbots and text generation
  • Machine translation
  • AI-assisted programming

4. Diffusion Models

Diffusion models are a new class of generative AI models that produce high-quality, detailed images by gradually refining random noise into structured outputs. These models are particularly effective for AI-generated photography and digital art.

Unlike GANs, which rely on adversarial training, diffusion models learn by reversing a noise process—meaning they start with pure noise and slowly reconstruct images.

Architecture of Diffusion Models

  1. Forward Process (Adding Noise):
    • A real image is gradually corrupted by adding random noise over multiple steps;
    • After enough steps, the image becomes pure noise;
  2. Reverse Process (Denoising Step-by-Step):
    • A neural network learns to remove the noise step-by-step;
    • Each step restores details in the image;
    • The final output is a high-resolution generated image.

Key Modules in Diffusion Models

  • Noise Scheduler – determines how much noise is added at each step;
  • U-Net Backbone – a convolutional neural network that learns to denoise images;
  • Time Encoding Module – Helps the model understand which step it is in the denoising process.

Common Uses:

  • AI-generated artwork and photography;
  • Image restoration (removing blur and noise);
  • High-resolution video frame interpolation.

How Diffusion Models Improve Over GANs

Diffusion models provide greater stability, higher-quality outputs, and more diversity than GANs. While GANs rely on adversarial training, which can lead to unstable result and mode collapse, diffusion models gradually refine noise into detailed images, ensuring consistent quality. They also produce more diverse outputs, whereas GANs may generate repetitive content. However, diffusion models require longer computation times due to their step-by-step denoising process, making them slower but more reliable for high-quality image synthesis.

Conclusion

Generative AI consists of four major deep learning models, each optimized for different tasks:

  • GANs specialize in deepfakes, AI art generation;
  • VAEs are commonly used for data augmentation and anomaly detection;
  • Transformers are best suited for text generation.
  • Diffusion Models offer the highest-quality images with stable training.

Each model has unique advantages and continues to evolve, shaping the future of AI-driven creativity and automation.

1. Which Generative AI model type uses two competing networks to improve content generation?

2. Which model is best suited for text generation and natural language processing?

3. Which type of Generative AI model gradually refines noise to generate realistic images?

question mark

Which Generative AI model type uses two competing networks to improve content generation?

Select the correct answer

question mark

Which model is best suited for text generation and natural language processing?

Select the correct answer

question mark

Which type of Generative AI model gradually refines noise to generate realistic images?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 3
Vi beklagar att något gick fel. Vad hände?
some-alt