Friday, July 1, 2016

Monte Carlo Integration in Scala

Target audience: Intermediate
Estimated reading time: 5'


This post introduces an overlooked numerical integration method leveraging the ubiquitous Monte Carlo simulation.


Table of contents
Follow me on LinkedIn

What you will learn: How to apply Monte Carlo method to compute intractable integral.

Notes:
  • This article assumes the reader has some basic knowledge of the Scala programming language
  • Scala version: 2.12.4

Introduction

Some functions do not have a closed-form solution for calculating a definite integral, a process called symbolic integration. For such cases, numerous numerical integration techniques are available for continuous functions. Examples include Simpson's formula, the Newton-Cotes quadrature rule, and Gaussian quadrature. These approaches are deterministic in their execution.

On the other hand, the Monte Carlo method of numerical integration adopts a stochastic approach. It uses randomly chosen values to approximate the area under a curve, across a surface, or within any multidimensional space [ref 1]. Demonstrations of the Monte Carlo integration technique often involve applying a uniform random distribution of data points to estimate the integral of a single-variable function.

Basic concept

Let's consider the single variable function illustrated in the following line plot.
\[f(x)=\frac{1}{x}-0.5\]
Definition of outer area for Monte Carlo random simulation

Our goal is to calculate the integral of a function across the range [1, 3]. The Monte Carlo method for numerical integration involves three key steps [ref 2]:
  1. Establish the bounding area for the function based on the minimum and maximum values of x and y. In this instance, the x-values range from [1, 3], and the y-values span from [0.5, -0.12].
  2. Produce random data points distributed within this outer area.
  3. Determine the proportion of these random data points that fall within the area defined by the function, relative to the total number of generated points.

Scala implementation

This implementation relies on Scala features available in 2.11 and later version [ref 3].
Let's define a generic class,
MonteCarloIntegrator to encapsulate these 3 steps:
The class has two arguments (line 1)
  • The function f to sum
  • The number of random data points used in the methodology
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class MonteCarloIntegrator(f: Double => Double, numPoints: Int) {

  def integrate(from: Double, to: Double): Double = {
    val (min, max) = getBounds(from, to)
    val width = to - from
    val height = if (min >= 0.0) max else max - min
    val outerArea = width * height
    val randomx = new Random(System.currentTimeMillis)
    val randomy = new Random(System.currentTimeMillis + 42L)

    def randomSquare(randomx: Random, randomy: Random): Double = {
      val numInsideArea = Range(0, numPoints).foldLeft(0)(
        (s, n) => {
        val ptx = randomx.nextDouble * width + from
        val pty = randomy.nextDouble * height
        randomx.setSeed(randomy.nextLong)
        randomy.setSeed(randomx.nextLong)

        s + (if (pty > 0.0 && pty < f(ptx)) 1 
            else if (pty < 0.0 && pty > f(ptx)) -1 
            else 0)
      }
     )
      numInsideArea.toDouble * outerArea / numPoints
    }
    randomSquare(randomx, randomy)
  }
}

The method integrate implements the sum of the function over the interval [from, to] (line 3). The first step is to extract the bounds getBounds of the outer area (line 4) which size is computed on line 7. Each coordinate is assigned a random generator randomx and randomy (lines 8 & 9).
The nested method randomSquare records the number of data points, numInsideArea that falls into the area delimited by the function (line 13 - 21). The sum is computed as the relative number of data points inside the area delimited by the function (line 24).


The method getBounds is described in the following code snippet. It is a simple, although not particularly efficient approach to extract the boundary of the integration. It breaks down the interval into of steps (lines 2 & 3) and collects the minimum and maximum values of the function (lines 7 - 12). 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def getBounds(from: Double, to: Double): (Double, Double) = {
  val numSteps = Math.sqrt(numPoints).floor.toInt
  val stepSize = (to - from) / numSteps

  (0 to numSteps).foldLeft(Double.MaxValue, -Double.MaxValue))(
    (minMax, n) => {
      val y = f(n * stepSize + from)
      updateBounds(y, minMax) match {
        case 0x01 => (y, minMax._2)
        case 0x02 => (minMax._1, y)
        case 0x03 => (y, y)
        case _ => minMax
      }
    }
  )
}


def updateBounds(y: Double, minMax: (Double,Double)): Int = {
   var flag = 0x00
 
   if (y < minMax._1) flag += 0x01
   if (y > minMax._2) flag += 0x02
   flag
}


Precision

You may wonder about the accuracy of the Monte Carlo method and how many randomly generated data points are needed for a decent accuracy. Let's consider the same function \[f(x)=\frac{1}{x}-0.5\] and its indefinite integral, that is used to generate the expected sum for the function f \[\int f(x)dx=log(x)-\frac{1}{2x}+C\] The simple test consists of computing the error between the value produced by the definite integral and the sum from the Monte Carlo method as implemented in the following code snippet. 

val fct =(x: Double) => 2 * x - 1.0
val integral = (x: Double, c: Double) => x*x - x + c)

final val numPoints=10000
val integrator = new MonteCarloIntegrator(fct, numPoints)

val predicted = monteCarloIntegrator.integrate(1.0, 2.0)
val expected = integral(2.0, 0.0) - integral(1.0, 0.0)

Let's run the test 100 times for number of random data points varying between 100 and 50,000.


The Monte Carlo method can achieve a reasonable level of accuracy even with a limited number of data points. In this scenario, the marginal gain in precision doesn't warrant generating a large quantity of random data points beyond 1000.

A more efficient strategy would be to use a termination condition that concludes the summation process as quickly as possible, eliminating the need to guess the ideal number of random data points. In each iteration, a new batch of random data points, totaling 'numPoints', is introduced. One straightforward method for determining convergence is to compare the difference in the sum between two successive iterations:
  sum(existing data points + new data points) - sum(existing data points) < eps
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
def randomSquare(randomx: Random, randomy: Random): Double = {
  var oldValue = 0.0

  Range(0, numPoints).takeWhile(
      _ => {
         val numInsideArea = ....
          // ...
            s + ....
          })
   
       val newValue = numInsideArea.toDouble * outerArea / numPoints
       val diff = Math.abs(newValue - oldValue) 
    
      oldValue = newValue
      diff < eps
    }
  )
  oldValue
}
In each iteration, we randomly generate a fresh batch of 'numPoints' data points to improve the accuracy of the overall summation. The termination of this process is governed by the 'takeWhile' method, a higher-order function used in Scala collections (as seen in lines 3 & 12).
It's important to note that this implementation of Monte Carlo integration is straightforward and serves to demonstrate the application of stochastic methods in calculus. For functions with significant inflection points (characterized by extreme second-order derivatives), recursive stratified sampling has proven to be more precise in calculating definite integrals.

References

[1Monte Carlo Integration Dartmouth College - C. Robert, G. Casella
[2]  The Clever Machine: Monte Carlo Approximations Dustin Stansbury
[3] Programming in Scala - 3rd edition M. Odersky, L. Spoon, B. Venners
github.com/patnicolas



-------------
Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning. 
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning", Packt Publishing ISBN 978-1-78712-238-3