(This article was originally published at Analysis with Programming, and syndicated at StatsBlogs.)
One of the problems often dealt in Statistics is minimization of the objective function. And contrary to the linear models, there is no analytical solution for models that are nonlinear on the parameters such as logistic regression, neural networks, and nonlinear regression models (like Michaelis-Menten model). In this situation, we have to use mathematical programming or optimization. And one popular optimization algorithm is the gradient descent, which we’re going to illustrate here. To start with, let’s consider a simple function with closed-form solution given by begin{equation} f(beta) triangleq beta^4 – 3beta^3 + 2. end{equation} We want to minimize this function with respect to $beta$. The quick solution to this, as what calculus taught us, is to compute for the first derivative of the function, that is begin{equation} frac{text{d}f(beta)}{text{d}beta}=4beta^3-9beta^2. end{equation} Setting this to 0 to obtain the stationary point gives us begin{align} frac{text{d}f(beta)}{text{d}beta}&overset{text{set}}{=}0nonumber 4hat{beta}^3-9hat{beta}^2&=0nonumber 4hat{beta}^3&=9hat{beta}^2nonumber 4hat{beta}&=9nonumber hat{beta}&=frac{9}{4}. end{align} The following plot shows the minimum of the function at $hat{beta}=frac{9}{4}$ (red line in the plot below).R ScriptNow let’s consider minimizing this problem using gradient descent with the following algorithm:
- Initialize $mathbf{x}_{r},r=0$
- while $lVert mathbf{x}_{r}-mathbf{x}_{r+1}rVert > nu$
- $mathbf{x}_{r+1}leftarrow mathbf{x}_{r} – gammanabla f(mathbf{x}_r)$
- $rleftarrow r + 1$
- end while
- return $mathbf{x}_{r}$ and $r$
where $nabla f(mathbf{x}_r)$ is the gradient of the cost function, $gamma$ is the learning-rate parameter of the algorithm, and $nu$ is the precision parameter. For the function above, let the initial guess be $hat{beta}_0=4$ and $gamma=.001$ with $nu=.00001$. Then $nabla f(hat{beta}_0)=112$, so that [hat{beta}_1=hat{beta}_0-.001(112)=3.888.] And $|hat{beta}_1 – hat{beta}_0| = 0.112> nu$. Repeat the process until at some $r$, $|hat{beta}_{r}-hat{beta}_{r+1}| ngtr nu$. It will turn out that 350 iterations are needed to satisfy the desired inequality, the plot …read more
Source:: statsblogs.com