Showing posts with label optimization. Show all posts
Showing posts with label optimization. Show all posts

Thursday, February 4, 2016

Revisit Newton-Raphson

Target audience: Intermediate
Estimated reading time: 3'

The Newton-Raphson method stands out for its straightforwardness, both conceptually and in execution. We'll delve into two distinct Scala implementations and wrap up the discussion by weighing the pros and cons of the Newton-Raphson approach.


Table of contents
Follow me on LinkedIn

The Newton-Raphson is a well-know technique to compute the root of a function, f(x) = 0 or the minimum/maximum of a function using its derivative f'(x) = 0.

First implementation

Let's dive into the computation of the root x of a function f. A function is defined by its Taylor series of derivatives as follow:
\[f(s)=\sum_{0}^{\infty }\frac{f^{(n)}(x)}{n!}(s-x)^{n}\] In case of the root f(s), the equation becomes
\[0=f(x)+f'(x)(x-s)+O((x-s)^{2}f"(s))\] resulting to
\[x_{n+1}=x_{n}- \frac{f(x_{n})}{f'(x_{n})}\]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
final class NewtonRaphson(
   f:   Double => Double,
   df:  Double => Double,
   dff: Option[Double => Double] = None
) {
   final val EPS = 0.01

   @scala.annotation.tailrec
   def root(z: Double): Double = dff.map(_dff => {
       val nextX = x - f(x) / df(x)
       if (Math.abs(nextX - x) < EPS) nextX 
       else root(nextX)
    }
}

The constructor (lines 2 to 4) takes 3 arguments:
  • Function which root is to be computed
  • First order derivative
  • Optional second order derivative that is not used in this first implementation
The formula is implemented by the method root (lines 9 to 12). Let's apply the root method to a concrete example, with function f and its derivative ff:

1
2
3
4
5
val f = (x: Double) => Math.pow(x, 5.0) - 2.0*x
val ff = (x: Double) => 5.0*Math.pow(x,4.0) -2.0
 
val nr1 = new NewtonRaphson(f, ff)
val z1 = nr1.root(7.9)

So far so good. However it is assumed that the function has a single root, or in the case of a maximum/minimum, its derivative has single root, at least in the vicinity of the initial data point. What about computing the root of a function which may have no or several root?
Let's look at the following function g and its derivative gg


1
2
3
4
val g = (x: Double) => Math.exp(0.5 * x)
val gg = (x: Double) => 0.5 * g(x)

val nr3 = new NewtonRaphson(g, gg)

The call to the Newton-Raphson method nr5 will diverge. In this particular case, the computation generates a Double.NaN, at each iteration.
There are several options to guarantee that the method will properly exit in the case no root exists. Let's look at one solution that leverages on the second derivative.

Second derivative to the rescue

We need to go back to basics: in its simplest form, the Newton-Raphson does not take into account the derivative of order higher than 1. The second order derivative, f" can be used as rough approximation of the error on the approximation of the root. The error san be evaluated at each iteration against a convergence criteria EPS.
\[\Delta \varepsilon = \varepsilon _{n+1}- \varepsilon _{n} \sim \frac{f'(x_{n})}{f"(x_{n})} < EPS\] \[x_{n+1}=x_{n}-\frac{f(x_{n})}{f'(x_{n})}(1 + \Delta \varepsilon)\] The approximation of the error is also used to validate whether the algorithm converges toward a solution (root) or starts to diverge. Let's implement this variation of the Newton-Raphson using the second order derivative as a convergence criteria.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
@scala.annotation.tailrec
def root(z: Double): Double = dff.map(_dff => {

   @scala.annotation.tailrec
   def nr2ndOrder(xe: (Double, Double)): (Double, Double) = {
      val x = xe._1
      val eps = xe._2
      
      val nextEps = Math.abs(df(x) / _dff(x))
      val nextX = (x - f(x) *(1.0 + nextEps)/df(x))

      // rules to exit recursion
      if (eps > 0.0 && nextEps >= eps)
         (-1.0, -1.0)
      else if (Math.abs(nextEps - eps) < EPS) 
         (nextX, nextEps)
      else 
         nr2ndOrder((nextX, nextEps))
    }
  
    nr2ndOrder((z, 0.0))._1
}
 ).getOrElse(DIVERGE)

Once again, the computation of the root is implemented as an efficient tail recursion defined in the method nr2ndOrder (line 5)
The convergence criteria is implemented as follow:
  • if error, eps increases, exit the recursion => Diverge (lines 13 and 22)
  • if eps decreases with a value within the convergence criteria => Converged (line 15)
  • if eps decreases with a value higher than the convergence criteria => Recurse (line 18)

Summary

Let's compare the convergence of the simplest Newton-Raphson method with the convergence its variant using the 2nd order derivative, using the function f defined in the first paragraph.

Clearly, the adaptive step size using the second order derivative speeds up the convergence toward the root of the function f. The selection of the appropriate method comes down to the trade-off between the overhead of the computation of the second derivative and the larger number of iterations or recursions required to find the root within a predefined convergence criteria.

References

Monday, November 9, 2015

Time Complexity for Graph & ML Algorithms

Target audience: Beginners
Estimated reading time: 3'


Asymptotic time complexity gives a high-level understanding of an algorithm's efficiency and scalability, which is crucial for comparing algorithms, predicting their performance, and identifying potential bottlenecks.
This short article lists the time complexity for the most common graph, linear algebra and machine learning algorithms.


The most used Big O notation defines the upper bound on the time complexity, so the runtime does not exceed a certain threshold as the size of input grows indefinitely.


Follow me on LinkedIn

Overview

Time complexity (or worst case scenario for the duration of execution given a number of elements) is commonly used in computing science. However, you will be hard pressed to find a comparison of machine learning algorithms using their asymptotic execution time.

Summary

The following summary list the time complexity of some of the most common algorithms used in machine learning including, recursive computation for dynamic programming, linear algebra, optimization and discriminative supervised training.

Algorithm

Time complexity

Hash table search

1

Array

N

Binary tree

Log N

KD tree

N

Recursion with one element reduction

N2

Recursion with halving reduction

logN

Recursion with halving reduction

N

Naïve Bayes

N.D.C

Nearest neighbors search

D.logk.n.Logn

PCA

D2n + n3

Linear regression

D2n + n3

Logistic regression

InDC

Matrix multiplication (a, b) x (b, c)

a.b.c

Matrix multiplication (a, a)

a3

Matrix multiplication (a, a) Strassen

a2.8

Partial eigenvalues extraction

e.n2

Complete eigenvalues extraction

n3

Minimum spanning tree Prim linear queue

V2

Minimum spanning tree Prim binary heap

(E + V).logV

Minimum spanning tree Prim Fibonacci h

V.logV

Shortest paths Graph Dijkstra linear sort

V2

Shortest paths Graph Dijkstra binary heap

(E + V).logV

Shortest paths Graph Dijkstra Fibonacci 

V.logV

Shortest paths kNN Graph - Dijkstra

(k + logN).N2

Shortest paths kNN Graph Floyd-Warshall

N3

Fast Fourier transform

n.logn

Batched gradient descent

n.P.ep

Stochastic gradient descent

n.P.ep

Newton with Hessian computation

n3.ep

Conjugate gradient

n.n0.sqrt(cd)

L-BFGS

n.g.I

K-means (*)

C.n.D.I

Decision tree

n2.D

Lasso regularization - LARS(*)

n.D.min(n,D)

Hidden Markov Model Forward/backward 

s2.n

Multilayer Perceptron (*)

i.h.o.n.ep

Support vector machine (*) Newton

D2n + n3

Support vector machine (*) Cholesky

D2n + n2

Support vector machine (*) - SMO

D2n + n2

Principal Components Analysis (*)

D2n + n3

Expectation-Maximization (*)

D2n

Laplacian Eigenmaps

D.logk.n.logn + m.n.k2 + D.N2

Genetic algorithms

pz.logpz.I.G

Feed forward neural network

n.h.(i + h2/H).H



N

Number elements

n

Number of observations

D

Dimension/features

C

Number of classes

I

Number of iterations

ep

Number of epochs

k

Number of neighbors

a

1st dimension matrix

b

2nd dimension of matrix

c

3rd dimension of matrix

e

Number of eigenvalues

V

Number of vertices

E

Number of edges

P

Number of variables

cd

Condition of matrix

n0

Number of non-zero

g

Number of gradients

s

Number of states

i

Number of input variables

h

Number of hidden neurons

H

Number of hidden layers

o

Number output variables

G

Number of genes

pz

Population size



References