Implementing Gradient Descent in Python
Gradient descent follows a simple but powerful idea: move in the direction of steepest descent to minimize a function.
The mathematical rule is:
theta = theta - alpha * gradient(theta)
Where:
theta
is the parameter we are optimizing;alpha
is the learning rate (step size);gradient(theta)
is the gradient of the function attheta
.
1. Define the Function and Its Derivative
We start with a simple quadratic function:
def f(theta):
return theta**2 # Function we want to minimize
Its derivative (gradient) is:
def gradient(theta):
return 2 * theta # Derivative: f'(theta) = 2*theta
f(theta)
: this is our function, and we want to find the value of theta that minimizes it;gradient(theta)
: this tells us the slope at any pointtheta
, which we use to determine the update direction.
2. Initialize Gradient Descent Parameters
alpha = 0.3 # Learning rate
theta = 3.0 # Initial starting point
tolerance = 1e-5 # Convergence threshold
max_iterations = 20 # Maximum number of updates
alpha
(learning rate): controls how big each step is;theta
(initial guess): the starting point for descent;tolerance
: when the updates become tiny, we stop;max_iterations
: ensures we don't loop forever.
3. Perform Gradient Descent
for i in range(max_iterations):
grad = gradient(theta) # Compute gradient
new_theta = theta - alpha * grad # Update rule
if abs(new_theta - theta) < tolerance:
print("Converged!")
break
theta = new_theta
- Calculate the gradient at
theta
; - Update
theta
using the gradient descent formula; - Stop when updates are too small (convergence);
- Print each step to monitor progress.
4. Visualizing Gradient Descent
123456789101112131415161718192021def f(theta): return theta**2 # Function we want to minimize def gradient(theta): return 2 * theta # Derivative: f'(theta) = 2*theta alpha = 0.3 # Learning rate theta = 3.0 # Initial starting point tolerance = 1e-5 # Convergence threshold max_iterations = 20 # Maximum number of updates for i in range(max_iterations): grad = gradient(theta) # Compute gradient new_theta = theta - alpha * grad # Update rule if abs(new_theta - theta) < tolerance: print("Converged!") break theta = new_theta plt.scatter(theta_values, output_values, color='red', label="Gradient Descent Steps") plt.plot(theta_range, output_range, label="f(theta) = theta^2", color='black')
This plots:
- The function curve
f(theta) = theta^2
; - Red dots showing gradient descent steps;
- Arrows indicating that the function extends indefinitely.
1. What is the role of the learning rate (alpha
) in gradient descent?
2. Why do we take the negative gradient in gradient descent?
3. What happens if the learning rate (alpha
) is too large?
4. What is the stopping condition for gradient descent?
5. If our function is f(theta) = theta**2
, what is the gradient descent update rule?
6. If we start at theta = 2.0
with alpha = 0.1
and gradient(theta) = 2 * theta
, what is theta after the first update?
¿Todo estuvo claro?
¡Gracias por tus comentarios!
Sección 3. Capítulo 10
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla
Awesome!
Completion rate improved to 1.89
Implementing Gradient Descent in Python
Desliza para mostrar el menú
Gradient descent follows a simple but powerful idea: move in the direction of steepest descent to minimize a function.
The mathematical rule is:
theta = theta - alpha * gradient(theta)
Where:
theta
is the parameter we are optimizing;alpha
is the learning rate (step size);gradient(theta)
is the gradient of the function attheta
.
1. Define the Function and Its Derivative
We start with a simple quadratic function:
def f(theta):
return theta**2 # Function we want to minimize
Its derivative (gradient) is:
def gradient(theta):
return 2 * theta # Derivative: f'(theta) = 2*theta
f(theta)
: this is our function, and we want to find the value of theta that minimizes it;gradient(theta)
: this tells us the slope at any pointtheta
, which we use to determine the update direction.
2. Initialize Gradient Descent Parameters
alpha = 0.3 # Learning rate
theta = 3.0 # Initial starting point
tolerance = 1e-5 # Convergence threshold
max_iterations = 20 # Maximum number of updates
alpha
(learning rate): controls how big each step is;theta
(initial guess): the starting point for descent;tolerance
: when the updates become tiny, we stop;max_iterations
: ensures we don't loop forever.
3. Perform Gradient Descent
for i in range(max_iterations):
grad = gradient(theta) # Compute gradient
new_theta = theta - alpha * grad # Update rule
if abs(new_theta - theta) < tolerance:
print("Converged!")
break
theta = new_theta
- Calculate the gradient at
theta
; - Update
theta
using the gradient descent formula; - Stop when updates are too small (convergence);
- Print each step to monitor progress.
4. Visualizing Gradient Descent
123456789101112131415161718192021def f(theta): return theta**2 # Function we want to minimize def gradient(theta): return 2 * theta # Derivative: f'(theta) = 2*theta alpha = 0.3 # Learning rate theta = 3.0 # Initial starting point tolerance = 1e-5 # Convergence threshold max_iterations = 20 # Maximum number of updates for i in range(max_iterations): grad = gradient(theta) # Compute gradient new_theta = theta - alpha * grad # Update rule if abs(new_theta - theta) < tolerance: print("Converged!") break theta = new_theta plt.scatter(theta_values, output_values, color='red', label="Gradient Descent Steps") plt.plot(theta_range, output_range, label="f(theta) = theta^2", color='black')
This plots:
- The function curve
f(theta) = theta^2
; - Red dots showing gradient descent steps;
- Arrows indicating that the function extends indefinitely.
1. What is the role of the learning rate (alpha
) in gradient descent?
2. Why do we take the negative gradient in gradient descent?
3. What happens if the learning rate (alpha
) is too large?
4. What is the stopping condition for gradient descent?
5. If our function is f(theta) = theta**2
, what is the gradient descent update rule?
6. If we start at theta = 2.0
with alpha = 0.1
and gradient(theta) = 2 * theta
, what is theta after the first update?
¿Todo estuvo claro?
¡Gracias por tus comentarios!
Sección 3. Capítulo 10