Summary  
Gradient descent is an iterative optimization algorithm that updates parameters by moving them in the direction of steepest descent—determined by the gradient and a learning rate—until a function’s minimum is reached.

General domain of usage  
Training machine learning models

**Gradient Descent** is an optimization algorithm that minimizes a function by iteratively adjusting its parameters in the direction of the steepest decrease. It is fundamental in machine learning for enabling models to learn efficiently from data.


Definition

## Understanding Gradients
The gradient of a function represents the **direction and steepness** of the function at a given point. It tells us **which way to move** to minimize the function.

For a simple function:

$$
J(\theta) = \theta^2
$$
 
The derivative (gradient) is:

$$
\nabla J(\theta) = \frac{d}{d \theta}\left(\theta^2\right)= 2\theta
$$

This means that for any value of $$θ$$, the gradient tells us how to adjust $$θ$$ to **descend** toward the minimum.


## Gradient Descent Formula
The **weight update rule** is:

$$
\theta \larr \theta - \alpha \nabla J(\theta)
$$

Where:
- $$\theta$$ - model parameter;
- $$\alpha$$ - learning rate (step size);
- $$\nabla J(\theta)$$ - gradient of the function we're aiming to minimize.

For our function:

$$
\theta_{\text{new}} = \theta_{\text{old}} - \alpha\left(2\theta_{old}\right)
$$

This means we update $$θ$$ **iteratively** by subtracting the **scaled** gradient.


## Stepwise Movement – A Visual
Example with start values: $$\theta = 3$$, $$\alpha = 0.3$$

1. $$\theta_1 = 3 - 0.3(2 \times 3) = 3 - 1.8 = 1.2;$$	 
2. $$\theta_2 = 1.2 - 0.3(2 \times 1.2) = 1.2 - 0.72 = 0.48;$$	 
3. $$\theta_3 = 0.48 - 0.3(2\times0.48) = 0.48 - 0.288 = 0.192;$$ 
4. $$\theta_4 = 0.192 - 0.3(2 \times 0.192) = 0.192 - 0.115 = 0.077.$$
 
After a few iterations, we move toward $$θ=0$$, the **minimum**.


## Learning Rate – Choosing α Wisely
- **Too large** $$\ \alpha$$ - overshoots, never converges;
- **Too small** $$\ \alpha$$ - converges too slowly;
- **Optimal** $$\ \alpha$$ - balances speed & accuracy.


## When Does Gradient Descent Stop?
Gradient descent stops when:
 
$$
\nabla J (\theta) \approx 0
$$

This means that further updates are **insignificant** and we've found a **minimum**.


If the gradient $$∇J(θ)$$ is zero, what does this mean?

This course distills the essentials of Python's math module, focusing on trigonometric functions, logarithmic operations, and mathematical constants. Learners will gain hands-on experience using Python to perform these standard mathematical computations, ideal for beginners seeking practical skills with the math module.