Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Gradient Descent | Section
Python Math Module Essentials: Trigonometry, Logarithms, and Constants - 1769704232288

Gradient Descent

Sveip for å vise menyen

Note
Definition

Gradient Descent is an optimization algorithm that minimizes a function by iteratively adjusting its parameters in the direction of the steepest decrease. It is fundamental in machine learning for enabling models to learn efficiently from data.

Understanding Gradients

The gradient of a function represents the direction and steepness of the function at a given point. It tells us which way to move to minimize the function.

For a simple function:

J(θ)=θ2J(\theta) = \theta^2

The derivative (gradient) is:

J(θ)=ddθ(θ2)=2θ\nabla J(\theta) = \frac{d}{d \theta}\left(\theta^2\right)= 2\theta

This means that for any value of θθ, the gradient tells us how to adjust θθ to descend toward the minimum.

Gradient Descent Formula

The weight update rule is:

θθαJ(θ)\theta \larr \theta - \alpha \nabla J(\theta)

Where:

  • θ\theta - model parameter;
  • α\alpha - learning rate (step size);
  • J(θ)\nabla J(\theta) - gradient of the function we're aiming to minimize.

For our function:

θnew=θoldα(2θold)\theta_{\text{new}} = \theta_{\text{old}} - \alpha\left(2\theta_{old}\right)

This means we update θθ iteratively by subtracting the scaled gradient.

Stepwise Movement – A Visual

Example with start values: θ=3\theta = 3, α=0.3\alpha = 0.3

  1. θ1=30.3(2×3)=31.8=1.2;\theta_1 = 3 - 0.3(2 \times 3) = 3 - 1.8 = 1.2;
  2. θ2=1.20.3(2×1.2)=1.20.72=0.48;\theta_2 = 1.2 - 0.3(2 \times 1.2) = 1.2 - 0.72 = 0.48;
  3. θ3=0.480.3(2×0.48)=0.480.288=0.192;\theta_3 = 0.48 - 0.3(2\times0.48) = 0.48 - 0.288 = 0.192;
  4. θ4=0.1920.3(2×0.192)=0.1920.115=0.077.\theta_4 = 0.192 - 0.3(2 \times 0.192) = 0.192 - 0.115 = 0.077.

After a few iterations, we move toward θ=0θ=0, the minimum.

Learning Rate – Choosing α Wisely

  • Too large  α\ \alpha - overshoots, never converges;
  • Too small  α\ \alpha - converges too slowly;
  • Optimal  α\ \alpha - balances speed & accuracy.

When Does Gradient Descent Stop?

Gradient descent stops when:

J(θ)0\nabla J (\theta) \approx 0

This means that further updates are insignificant and we've found a minimum.

question mark

If the gradient J(θ)∇J(θ) is zero, what does this mean?

Velg det helt riktige svaret

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 25

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 1. Kapittel 25
some-alt