Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Gradient Descent Method | Mathematical Analysis
Mathematics for Data Analysis and Modeling
course content

Course Content

Mathematics for Data Analysis and Modeling

Mathematics for Data Analysis and Modeling

1. Basic Mathematical Concepts and Definitions
2. Linear Algebra
3. Mathematical Analysis

book
Gradient Descent Method

We know how to solve optimization problems for the function of one variable using the algorithm described in the previous chapter. But what can we do if we have a function of multiple variables? We can turn to a numerical method - gradient descent.

What is gradient descent?

Gradient is a vector that consists of all partial derivatives of the function:

Thus, the problem of minimisation of function F(x1,..,xn) can be solved by constructing the following sequence of approximations:

We set a certain initial value x0 and η value representing the speed of gradient descent. Then we start the iterative process according to the formula above.

Stop criteria of the algorithm

The criteria for stopping iterations can be as follows:

  1. Stop the algorithm after a certain number of iterations;
  2. Iterate until the following condition is met:

Note

eps = 10**(-6) or eps = 10**(-9) values are commonly used as the stop criterion of the iteration process.

We have to pay attention to two important features of the gradient descent method:

  1. This method can only find the point of minimum of the function F(x). If you want to find a point of maximum, you can consider function -F(x) and use gradient descent for it;
  2. If we compare the algorithm we discussed earlier with gradient descent, we can see that gradient descent performs a similar task to the first stage of the algorithm - finding a critical value, which might be a potential minimum point. As a result, it is possible that the point found by gradient descent may either be a local minimum within some subset of the domain or not a minimum point at all.

Example

Let's find out how to solve the optimization problem in Python:

12345678910111213141516171819
import numpy as np from scipy.optimize import minimize # Define the Rosenbrock function def rosenbrock(x): return (1 - x[0])**2 + 100*(x[1] - x[0]**2)**2 # Initial guess for x and y x0 = np.array([2, 2]) # Use scipy.optimize.minimize to find the minimum of the Rosenbrock function result = minimize(rosenbrock, x0) # Extract the optimal x and the minimum value of the Rosenbrock function optimal_x = result.x min_value = result.fun print(f'Optimal x: {optimal_x}') print(f'Minimum value of the Rosenbrock function: {min_value:.4f}')
copy

In this example, we define the Rosenbrock function, set an initial guess for x, and then use scipy.optimize.minimize to find the minimum of the Rosenbrock function. The result.x attribute contains the optimal x, and result.fun contains the minimum value of the Rosenbrock function.

Note

The Rosenbrock function is often used as a benchmark for testing and comparing optimization algorithms due to its non-convex nature and the presence of a narrow, curved minimum valley.

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 3. Chapter 6
We're sorry to hear that something went wrong. What happened?
some-alt