Gradient Descent, in general, are iterative and stochastic approximation to minimize the function of interest. This function is called as an objective function or cost function, in the case of Machine Learning. The process of minimizing this function leads to a “learned” model.

This technique is one of the most commonly used minimization procedures in ML.

Gradient descent minimizes the objective function (aka surface) $J(\theta)$ where $\theta$ are the parameters of the system, by updating these parameters in the opposite side of their gradient.

$g_t = \frac{\partial J(\theta_t)}{\partial \theta_t}$
$\theta_{t+1} = \theta_t - \eta . g_t$

Where:

$\theta$ = set of parameters
$J$ = the function or surface to be optimized
$\eta$ = learning rate, a hyperparameter that is to be tuned to achieve good convergence.