Implementing Gradient Descent in Python
Gradient descent follows a simple but powerful idea: move in the direction of steepest descent to minimize a function.
The mathematical rule is:
theta = theta - alpha * gradient(theta)
Where:
theta
is the parameter we are optimizing;alpha
is the learning rate (step size);gradient(theta)
is the gradient of the function attheta
.
1. Define the Function and Its Derivative
We start with a simple quadratic function:
def f(theta):
return theta**2 # Function we want to minimize
Its derivative (gradient) is:
def gradient(theta):
return 2 * theta # Derivative: f'(theta) = 2*theta
f(theta)
: this is our function, and we want to find the value of theta that minimizes it;gradient(theta)
: this tells us the slope at any pointtheta
, which we use to determine the update direction.
2. Initialize Gradient Descent Parameters
alpha = 0.3 # Learning rate
theta = 3.0 # Initial starting point
tolerance = 1e-5 # Convergence threshold
max_iterations = 20 # Maximum number of updates
alpha
(learning rate): controls how big each step is;theta
(initial guess): the starting point for descent;tolerance
: when the updates become tiny, we stop;max_iterations
: ensures we don't loop forever.
3. Perform Gradient Descent
for i in range(max_iterations):
grad = gradient(theta) # Compute gradient
new_theta = theta - alpha * grad # Update rule
if abs(new_theta - theta) < tolerance:
print("Converged!")
break
theta = new_theta
- Calculate the gradient at
theta
; - Update
theta
using the gradient descent formula; - Stop when updates are too small (convergence);
- Print each step to monitor progress.
4. Visualizing Gradient Descent
123456789101112131415161718192021def f(theta): return theta**2 # Function we want to minimize def gradient(theta): return 2 * theta # Derivative: f'(theta) = 2*theta alpha = 0.3 # Learning rate theta = 3.0 # Initial starting point tolerance = 1e-5 # Convergence threshold max_iterations = 20 # Maximum number of updates for i in range(max_iterations): grad = gradient(theta) # Compute gradient new_theta = theta - alpha * grad # Update rule if abs(new_theta - theta) < tolerance: print("Converged!") break theta = new_theta plt.scatter(theta_values, output_values, color='red', label="Gradient Descent Steps") plt.plot(theta_range, output_range, label="f(theta) = theta^2", color='black')
This plots:
- The function curve
f(theta) = theta^2
; - Red dots showing gradient descent steps;
- Arrows indicating that the function extends indefinitely.
1. What is the role of the learning rate (alpha
) in gradient descent?
2. Why do we take the negative gradient in gradient descent?
3. What happens if the learning rate (alpha
) is too large?
4. What is the stopping condition for gradient descent?
5. If our function is f(theta) = theta**2
, what is the gradient descent update rule?
6. If we start at theta = 2.0
with alpha = 0.1
and gradient(theta) = 2 * theta
, what is theta after the first update?
Alt var klart?
Takk for tilbakemeldingene dine!
Seksjon 3. Kapittel 10
Spør AI
Spør AI
Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår
Awesome!
Completion rate improved to 1.89
Implementing Gradient Descent in Python
Sveip for å vise menyen
Gradient descent follows a simple but powerful idea: move in the direction of steepest descent to minimize a function.
The mathematical rule is:
theta = theta - alpha * gradient(theta)
Where:
theta
is the parameter we are optimizing;alpha
is the learning rate (step size);gradient(theta)
is the gradient of the function attheta
.
1. Define the Function and Its Derivative
We start with a simple quadratic function:
def f(theta):
return theta**2 # Function we want to minimize
Its derivative (gradient) is:
def gradient(theta):
return 2 * theta # Derivative: f'(theta) = 2*theta
f(theta)
: this is our function, and we want to find the value of theta that minimizes it;gradient(theta)
: this tells us the slope at any pointtheta
, which we use to determine the update direction.
2. Initialize Gradient Descent Parameters
alpha = 0.3 # Learning rate
theta = 3.0 # Initial starting point
tolerance = 1e-5 # Convergence threshold
max_iterations = 20 # Maximum number of updates
alpha
(learning rate): controls how big each step is;theta
(initial guess): the starting point for descent;tolerance
: when the updates become tiny, we stop;max_iterations
: ensures we don't loop forever.
3. Perform Gradient Descent
for i in range(max_iterations):
grad = gradient(theta) # Compute gradient
new_theta = theta - alpha * grad # Update rule
if abs(new_theta - theta) < tolerance:
print("Converged!")
break
theta = new_theta
- Calculate the gradient at
theta
; - Update
theta
using the gradient descent formula; - Stop when updates are too small (convergence);
- Print each step to monitor progress.
4. Visualizing Gradient Descent
123456789101112131415161718192021def f(theta): return theta**2 # Function we want to minimize def gradient(theta): return 2 * theta # Derivative: f'(theta) = 2*theta alpha = 0.3 # Learning rate theta = 3.0 # Initial starting point tolerance = 1e-5 # Convergence threshold max_iterations = 20 # Maximum number of updates for i in range(max_iterations): grad = gradient(theta) # Compute gradient new_theta = theta - alpha * grad # Update rule if abs(new_theta - theta) < tolerance: print("Converged!") break theta = new_theta plt.scatter(theta_values, output_values, color='red', label="Gradient Descent Steps") plt.plot(theta_range, output_range, label="f(theta) = theta^2", color='black')
This plots:
- The function curve
f(theta) = theta^2
; - Red dots showing gradient descent steps;
- Arrows indicating that the function extends indefinitely.
1. What is the role of the learning rate (alpha
) in gradient descent?
2. Why do we take the negative gradient in gradient descent?
3. What happens if the learning rate (alpha
) is too large?
4. What is the stopping condition for gradient descent?
5. If our function is f(theta) = theta**2
, what is the gradient descent update rule?
6. If we start at theta = 2.0
with alpha = 0.1
and gradient(theta) = 2 * theta
, what is theta after the first update?
Alt var klart?
Takk for tilbakemeldingene dine!
Seksjon 3. Kapittel 10