Showing posts with label optimization. Show all posts

Thursday, February 4, 2016

Revisit Newton-Raphson

Target audience: Intermediate

Estimated reading time: 3'

The Newton-Raphson method stands out for its straightforwardness, both conceptually and in execution. We'll delve into two distinct Scala implementations and wrap up the discussion by weighing the pros and cons of the Newton-Raphson approach.

Table of contents

First implementation

Second derivative to the rescue

Summary

References

The Newton-Raphson is a well-know technique to compute the root of a function, f(x) = 0 or the minimum/maximum of a function using its derivative f'(x) = 0.

First implementation

Let's dive into the computation of the root x of a function f. A function is defined by its Taylor series of derivatives as follow:
\[f(s)=\sum_{0}^{\infty }\frac{f^{(n)}(x)}{n!}(s-x)^{n}\] In case of the root f(s), the equation becomes
\[0=f(x)+f'(x)(x-s)+O((x-s)^{2}f"(s))\] resulting to
\[x_{n+1}=x_{n}- \frac{f(x_{n})}{f'(x_{n})}\]

final class NewtonRaphson(
   f:   Double => Double,
   df:  Double => Double,
   dff: Option[Double => Double] = None
) {
   final val EPS = 0.01

   @scala.annotation.tailrec
   def root(z: Double): Double = dff.map(_dff => {
       val nextX = x - f(x) / df(x)
       if (Math.abs(nextX - x) < EPS) nextX 
       else root(nextX)
    }
}

The constructor (lines 2 to 4) takes 3 arguments:

Function which root is to be computed
First order derivative
Optional second order derivative that is not used in this first implementation

The formula is implemented by the method root (lines 9 to 12). Let's apply the root method to a concrete example, with function f and its derivative ff:

val f = (x: Double) => Math.pow(x, 5.0) - 2.0*x
val ff = (x: Double) => 5.0*Math.pow(x,4.0) -2.0
 
val nr1 = new NewtonRaphson(f, ff)
val z1 = nr1.root(7.9)

So far so good. However it is assumed that the function has a single root, or in the case of a maximum/minimum, its derivative has single root, at least in the vicinity of the initial data point. What about computing the root of a function which may have no or several root?
Let's look at the following function g and its derivative gg

val g = (x: Double) => Math.exp(0.5 * x)
val gg = (x: Double) => 0.5 * g(x)

val nr3 = new NewtonRaphson(g, gg)

The call to the Newton-Raphson method nr5 will diverge. In this particular case, the computation generates a Double.NaN, at each iteration.
There are several options to guarantee that the method will properly exit in the case no root exists. Let's look at one solution that leverages on the second derivative.

Second derivative to the rescue

We need to go back to basics: in its simplest form, the Newton-Raphson does not take into account the derivative of order higher than 1. The second order derivative, f" can be used as rough approximation of the error on the approximation of the root. The error san be evaluated at each iteration against a convergence criteria EPS.

\[\Delta \varepsilon = \varepsilon _{n+1}- \varepsilon _{n} \sim \frac{f'(x_{n})}{f"(x_{n})} < EPS\] \[x_{n+1}=x_{n}-\frac{f(x_{n})}{f'(x_{n})}(1 + \Delta \varepsilon)\] The approximation of the error is also used to validate whether the algorithm converges toward a solution (root) or starts to diverge. Let's implement this variation of the Newton-Raphson using the second order derivative as a convergence criteria.

@scala.annotation.tailrec
def root(z: Double): Double = dff.map(_dff => {

   @scala.annotation.tailrec
   def nr2ndOrder(xe: (Double, Double)): (Double, Double) = {
      val x = xe._1
      val eps = xe._2
      
      val nextEps = Math.abs(df(x) / _dff(x))
      val nextX = (x - f(x) *(1.0 + nextEps)/df(x))

      // rules to exit recursion
      if (eps > 0.0 && nextEps >= eps)
         (-1.0, -1.0)
      else if (Math.abs(nextEps - eps) < EPS) 
         (nextX, nextEps)
      else 
         nr2ndOrder((nextX, nextEps))
    }
  
    nr2ndOrder((z, 0.0))._1
}

 ).getOrElse(DIVERGE)

Once again, the computation of the root is implemented as an efficient tail recursion defined in the method nr2ndOrder (line 5)
The convergence criteria is implemented as follow:

if error, eps increases, exit the recursion => Diverge (lines 13 and 22)
if eps decreases with a value within the convergence criteria => Converged (line 15)
if eps decreases with a value higher than the convergence criteria => Recurse (line 18)

Summary

Let's compare the convergence of the simplest Newton-Raphson method with the convergence its variant using the 2nd order derivative, using the function f defined in the first paragraph.

Clearly, the adaptive step size using the second order derivative speeds up the convergence toward the root of the function f. The selection of the appropriate method comes down to the trade-off between the overhead of the computation of the second derivative and the larger number of iterations or recursions required to find the root within a predefined convergence criteria.

References

Programming in Scala 2nd Edition M. Odersky, L. Spoon, B. Venners - Artima - 2012
The Newton-Raphson method
github.com/patnicolas

Monday, November 9, 2015

Time Complexity for Graph & ML Algorithms

Target audience: Beginners

Estimated reading time: 3'

Posts history

Asymptotic time complexity gives a high-level understanding of an algorithm's efficiency and scalability, which is crucial for comparing algorithms, predicting their performance, and identifying potential bottlenecks.
This short article lists the time complexity for the most common graph, linear algebra and machine learning algorithms.

The most used Big O notation defines the upper bound on the time complexity, so the runtime does not exceed a certain threshold as the size of input grows indefinitely.

Overview

Time complexity (or worst case scenario for the duration of execution given a number of elements) is commonly used in computing science. However, you will be hard pressed to find a comparison of machine learning algorithms using their asymptotic execution time.

Summary

The following summary list the time complexity of some of the most common algorithms used in machine learning including, recursive computation for dynamic programming, linear algebra, optimization and discriminative supervised training.

Algorithm	Time complexity
Hash table search	1
Array	N
Binary tree	Log N
KD tree	N
Recursion with one element reduction	N²
Recursion with halving reduction	logN
Recursion with halving reduction	N
Naïve Bayes	N.D.C
Nearest neighbors search	D.logk.n.Logn
PCA	D²n + n³
Linear regression	D²n + n³
Logistic regression	InDC
Matrix multiplication (a, b) x (b, c)	a.b.c
Matrix multiplication (a, a)	a³
Matrix multiplication (a, a) Strassen	a^2.8
Partial eigenvalues extraction	e.n²
Complete eigenvalues extraction	n³
Minimum spanning tree Prim linear queue	V²
Minimum spanning tree Prim binary heap	(E + V).logV
Minimum spanning tree Prim Fibonacci h	V.logV
Shortest paths Graph Dijkstra linear sort	V²
Shortest paths Graph Dijkstra binary heap	(E + V).logV
Shortest paths Graph Dijkstra Fibonacci	V.logV
Shortest paths kNN Graph - Dijkstra	(k + logN).N²
Shortest paths kNN Graph Floyd-Warshall	N³
Fast Fourier transform	n.logn
Batched gradient descent	n.P.ep
Stochastic gradient descent	n.P.ep
Newton with Hessian computation	n³.ep
Conjugate gradient	n.n0.sqrt(cd)
L-BFGS	n.g.I
K-means (*)	C.n.D.I
Decision tree	n².D
Lasso regularization - LARS(*)	n.D.min(n,D)
Hidden Markov Model Forward/backward	s².n
Multilayer Perceptron (*)	i.h.o.n.ep
Support vector machine (*) Newton	D²n + n³
Support vector machine (*) Cholesky	D²n + n²
Support vector machine (*) - SMO	D²n + n²
Principal Components Analysis (*)	D²n + n³
Expectation-Maximization (*)	D²n
Laplacian Eigenmaps	D.logk.n.logn + m.n.k² + D.N²
Genetic algorithms	pz.logpz.I.G
Feed forward neural network	n.h.(i + h²/H).H

N	Number elements
n	Number of observations
D	Dimension/features
C	Number of classes
I	Number of iterations
ep	Number of epochs
k	Number of neighbors
a	1^st dimension matrix
b	2^nd dimension of matrix
c	3^rd dimension of matrix
e	Number of eigenvalues
V	Number of vertices
E	Number of edges
P	Number of variables
cd	Condition of matrix
n0	Number of non-zero
g	Number of gradients
s	Number of states
i	Number of input variables
h	Number of hidden neurons
H	Number of hidden layers
o	Number output variables
G	Number of genes
pz	Population size

References

Introduction to Algorithms T. Cormen, C. Leiserson, R. Rivest - MIT Press 1990
Machine Learning: A probabilistic Perspective K. Murphy - MIT Press, 2012
Big O cheat sheet
github.com/patnicolas

Thursday, February 4, 2016

Revisit Newton-Raphson

First implementation

Second derivative to the rescue

Summary

References

Monday, November 9, 2015

Time Complexity for Graph & ML Algorithms

Overview

Summary

Contact Form

Equation Editor

Popular Posts