Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre Implementing Gradient Descent in Python | Mathematical Analysis
Mathematics for Data Science

bookImplementing Gradient Descent in Python

Gradient descent follows a simple but powerful idea: move in the direction of steepest descent to minimize a function.

The mathematical rule is:

theta = theta - alpha * gradient(theta)

Where:

  • theta is the parameter we are optimizing;
  • alpha is the learning rate (step size);
  • gradient(theta) is the gradient of the function at theta.

1. Define the Function and Its Derivative

We start with a simple quadratic function:

def f(theta):
    return theta**2  # Function we want to minimize

Its derivative (gradient) is:

def gradient(theta):
    return 2 * theta  # Derivative: f'(theta) = 2*theta
  • f(theta): this is our function, and we want to find the value of theta that minimizes it;
  • gradient(theta): this tells us the slope at any point theta, which we use to determine the update direction.

2. Initialize Gradient Descent Parameters

alpha = 0.3  # Learning rate
theta = 3.0  # Initial starting point
tolerance = 1e-5  # Convergence threshold
max_iterations = 20  # Maximum number of updates
  • alpha (learning rate): controls how big each step is;
  • theta (initial guess): the starting point for descent;
  • tolerance: when the updates become tiny, we stop;
  • max_iterations: ensures we don't loop forever.

3. Perform Gradient Descent

for i in range(max_iterations):
    grad = gradient(theta)  # Compute gradient
    new_theta = theta - alpha * grad  # Update rule
    if abs(new_theta - theta) < tolerance:
        print("Converged!")
        break
    theta = new_theta
  • Calculate the gradient at theta;
  • Update theta using the gradient descent formula;
  • Stop when updates are too small (convergence);
  • Print each step to monitor progress.

4. Visualizing Gradient Descent

123456789101112131415161718192021
def f(theta): return theta**2 # Function we want to minimize def gradient(theta): return 2 * theta # Derivative: f'(theta) = 2*theta alpha = 0.3 # Learning rate theta = 3.0 # Initial starting point tolerance = 1e-5 # Convergence threshold max_iterations = 20 # Maximum number of updates for i in range(max_iterations): grad = gradient(theta) # Compute gradient new_theta = theta - alpha * grad # Update rule if abs(new_theta - theta) < tolerance: print("Converged!") break theta = new_theta plt.scatter(theta_values, output_values, color='red', label="Gradient Descent Steps") plt.plot(theta_range, output_range, label="f(theta) = theta^2", color='black')
copy

This plots:

  • The function curve f(theta) = theta^2;
  • Red dots showing gradient descent steps;
  • Arrows indicating that the function extends indefinitely.

1. What is the role of the learning rate (alpha) in gradient descent?

2. Why do we take the negative gradient in gradient descent?

3. What happens if the learning rate (alpha) is too large?

4. What is the stopping condition for gradient descent?

5. If our function is f(theta) = theta**2, what is the gradient descent update rule?

6. If we start at theta = 2.0 with alpha = 0.1 and gradient(theta) = 2 * theta, what is theta after the first update?

question mark

What is the role of the learning rate (alpha) in gradient descent?

Select the correct answer

question mark

Why do we take the negative gradient in gradient descent?

Select the correct answer

question mark

What happens if the learning rate (alpha) is too large?

Select the correct answer

question mark

What is the stopping condition for gradient descent?

Select the correct answer

question mark

If our function is f(theta) = theta**2, what is the gradient descent update rule?

Select the correct answer

question mark

If we start at theta = 2.0 with alpha = 0.1 and gradient(theta) = 2 * theta, what is theta after the first update?

Select the correct answer

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 3. Chapitre 10

Demandez à l'IA

expand

Demandez à l'IA

ChatGPT

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Suggested prompts:

Can you explain how the learning rate affects convergence?

What happens if we change the initial value of theta?

Can you walk me through the plotting code in more detail?

Awesome!

Completion rate improved to 1.89

bookImplementing Gradient Descent in Python

Glissez pour afficher le menu

Gradient descent follows a simple but powerful idea: move in the direction of steepest descent to minimize a function.

The mathematical rule is:

theta = theta - alpha * gradient(theta)

Where:

  • theta is the parameter we are optimizing;
  • alpha is the learning rate (step size);
  • gradient(theta) is the gradient of the function at theta.

1. Define the Function and Its Derivative

We start with a simple quadratic function:

def f(theta):
    return theta**2  # Function we want to minimize

Its derivative (gradient) is:

def gradient(theta):
    return 2 * theta  # Derivative: f'(theta) = 2*theta
  • f(theta): this is our function, and we want to find the value of theta that minimizes it;
  • gradient(theta): this tells us the slope at any point theta, which we use to determine the update direction.

2. Initialize Gradient Descent Parameters

alpha = 0.3  # Learning rate
theta = 3.0  # Initial starting point
tolerance = 1e-5  # Convergence threshold
max_iterations = 20  # Maximum number of updates
  • alpha (learning rate): controls how big each step is;
  • theta (initial guess): the starting point for descent;
  • tolerance: when the updates become tiny, we stop;
  • max_iterations: ensures we don't loop forever.

3. Perform Gradient Descent

for i in range(max_iterations):
    grad = gradient(theta)  # Compute gradient
    new_theta = theta - alpha * grad  # Update rule
    if abs(new_theta - theta) < tolerance:
        print("Converged!")
        break
    theta = new_theta
  • Calculate the gradient at theta;
  • Update theta using the gradient descent formula;
  • Stop when updates are too small (convergence);
  • Print each step to monitor progress.

4. Visualizing Gradient Descent

123456789101112131415161718192021
def f(theta): return theta**2 # Function we want to minimize def gradient(theta): return 2 * theta # Derivative: f'(theta) = 2*theta alpha = 0.3 # Learning rate theta = 3.0 # Initial starting point tolerance = 1e-5 # Convergence threshold max_iterations = 20 # Maximum number of updates for i in range(max_iterations): grad = gradient(theta) # Compute gradient new_theta = theta - alpha * grad # Update rule if abs(new_theta - theta) < tolerance: print("Converged!") break theta = new_theta plt.scatter(theta_values, output_values, color='red', label="Gradient Descent Steps") plt.plot(theta_range, output_range, label="f(theta) = theta^2", color='black')
copy

This plots:

  • The function curve f(theta) = theta^2;
  • Red dots showing gradient descent steps;
  • Arrows indicating that the function extends indefinitely.

1. What is the role of the learning rate (alpha) in gradient descent?

2. Why do we take the negative gradient in gradient descent?

3. What happens if the learning rate (alpha) is too large?

4. What is the stopping condition for gradient descent?

5. If our function is f(theta) = theta**2, what is the gradient descent update rule?

6. If we start at theta = 2.0 with alpha = 0.1 and gradient(theta) = 2 * theta, what is theta after the first update?

question mark

What is the role of the learning rate (alpha) in gradient descent?

Select the correct answer

question mark

Why do we take the negative gradient in gradient descent?

Select the correct answer

question mark

What happens if the learning rate (alpha) is too large?

Select the correct answer

question mark

What is the stopping condition for gradient descent?

Select the correct answer

question mark

If our function is f(theta) = theta**2, what is the gradient descent update rule?

Select the correct answer

question mark

If we start at theta = 2.0 with alpha = 0.1 and gradient(theta) = 2 * theta, what is theta after the first update?

Select the correct answer

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 3. Chapitre 10
some-alt