Sunday, November 3, 2013

Breakable Loops in Scala

Target audience: Beginner
Estimated reading time: 4'



Table of contents
Follow me on LinkedIn

Introduction

Contrary to C++ and Java, Scala does not allow developer to exit an iterative execution prematurely using a syntax equivalent to break.

var sum = 0

for( i <- 0 until 100) {
  sum += i
  if( sum > 400) 
    break   // won't compile!                                  
}

Scala purists tend to stay away from this type of constructs and use higher order collection methods such as
  exists( p: (T) => Boolean)
   find( p: (T) => Boolean)
takeWhile( p: (T) => Boolean
)
However these methods are not available outside collections. There are cases where a `break` construct may be a simpler solution.
This post review the different options to 'break' from a loop according to a predefined condition.

Breakable control

Although breaking out of a loop may not be appealing to "pure" functional developer, the Scala language provides formal Java and C++ developers with the option to use break and continue statements. The breakable statement define the scope for which break is valid.

import scala.util.control.Breaks._                            

var sum = 0
breakable { 
  for (i <- 0 until 100 ) {
    sum += i
    if( sum > 55) 
      break
  }
}

Any experienced developer would be quite reluctant to use such an idiom: beside the fact that the accumulator is defined as a variable, the implementation is unnecessary convoluted adding a wrapper around the loop. Let's try to find a more functional like approach to breaking out of any loop.

scan & fold to the rescue

Luckily, Scala provides collections with functional traversal patterns that can be used and chained to break, elegantly from a loop. The following code snippets introduce applies some of those traversal patterns to an array and a associative map to illustrate the overall "functional" approach to iteration. 
Let's consider the problem of extracting the elements of an array or map before a predefined condition is met for an element. For instance, let's extract the elements of an array or a hash map until one element has the value 0. The Scala programming language provide us with the takeWhile method (lines 7 & 10) that allows to to end traversing a collection and return a elements of the collection visited when a condition is met.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
val values = Array[Int](9, 45, 11, 0, 67, 33)

val mappedValues = Map[String, Int](
  "A"->2, "B"->45, "C"->67, "D"->0, "E"->56
)
      // Extract a subarray of values until the condition is met
values takeWhile( _ != 0) // -> Array(9, 45, 11)
 
     // Extract a submap until the condition is met
mappedValues takeWhile( _._2 != 0) 
     // -> Map(E -> 56, A -> 2, B -> 45, C -> 67)

The second case consists in accumulating values until a condition on the accumulator is met. Contrary to fold and reduce methods which apply a function (i.e summation) to all the elements of a collection to return a single value, scan, scanLeft and scanRight methods return a collection of values processed by the function.

In the following example, the invocation of the higher order method scanLeft (lines 4 generates an array and a hash map with the cumulative values. The takeWhile method i(lines 5 & 8) s then applied to the resulting array or map to return a collection of cumulative values until the accumulator exceeds 56. The same invocation can be used to return the element which push the accumulator beyond the threshold.

1
2
3
4
5
6
7
8
9
values.scanLeft(0)(_ + _).takeWhile (_ < 56) 
   //-> Array(0, 9, 54)
 
mappedValues.scanLeft(0) (_ + _._2)
   .takeWhile( _ < 56)
 
val indexTargetEl = values.scanLeft(0)(_ + _)
      .takeWhile (_ < 56).size -1            
val targetEl = values(indexTargetEl)



Tail recursion

A third and elegant alternative to exit from a loop is using the tail recursion which is supported natively in the Scala language. The first method, findValue exits the loop when a condition on the value is reached (line 5). The second method, findCummulative, is implemented as a closure and exits when the sum of elements exceeds a predefined value (line 15).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
val values = Array[Int](9, 45, 11, 0, 67, 33)
 
@scala.annotation.tailrec
def findValue(values: Array[Int], index: Int) : Int = {
   if( values(index) == 0 || index >= values.size)
     index
   else
     findValue(values, index + 1)
}
val newValues = values.slice(0, findValue(values, 0))   

val MAX_VALUE :Int = 86

@scala.annotation.tailrec
def findCumulative(sum: Int, index: Int): Int = {
   if( index >= values.size ||sum >= MAX_VALUE)
     index -1
   else
     findCumulative(sum + values(index), index + 1)
}

val newValues2 = values.slice(0, findCumulative(0, 0))

This concludes our review of three different Scala constructs available to developer to exit prematurely from a loop
  • breakable construct
  • Higher order method such as exists, find, takeWhile....
  • tail recursion

References

Tuesday, October 8, 2013

Symbolic Regression for Data Modeling

Target audience: Beginner
Estimated reading time: 6'




Overview

Symbolic Regression allows domain experts to create, add modify rules or policies extracted from data. The most commonly used algorithms used in Symbolic Regression are:
  • Genetic Algorithms
  • Learning Classifiers Systems
Symbolic regression is used in many application ranging from network performance optimization, predicting failure (MTBF), streaming data to detecting security breaches.


Presentation

The following presentation describes the main components, benefits and drawbacks of symbolic regression.




References

  • Genetic Programming: on the Programming of Computers by Means of Natural Selection - J. Koza - MIT Press 1992
  • Reinforcement Learning: An introduction (Adaptive Computation and Machine learning) - R. Sutton, A. Barto - MIT Press 1998

Friday, September 13, 2013

Performance of Error Handling in Scala

Target audience: Beginner
Estimated reading time: 4'



Table of contents
       Positive test
       Negative test
Follow me on LinkedIn

Overview

The Scala programming language supports three different approaches to handle errors.
  • Error codes
  • Exceptions
  • Option monadic pattern
The Java and C++ programmers are familiar with the first 2 approaches. Scala introduces the Option type which is defined as a Monad.
Let's take a simple example; computation of the function sin(sqrt(x).
The client code unwraps the return type, Option[Double] to handle the error.


def sqrt(x: Double): Option[Double] = {
  if(x < 0.0) None
  else Some(Math.sqrt(x))
}

sqrt(a) match {
  case Some(a) => Math.sin(a)
  case None => Console.println(s"argument $a < 0.0")
}

An alternative is to use a default value with getOrElse in case or failure.

sqrt(a).map( Math.sin(_) ).getOrElse {s"argument $a < 0.0"; 0.0 }


Caution: You should never insecurely unwrap an option using the method get.
sqrt(-3.0).get generates a java.util.NoSuchElementException: None.get exception.


Option type has few important benefits:

  • The return type None represents absence of returned value or reference, which is safer to process by the client code than a return Null (i.e stray pointers)
  • The Option type allows developers to create their own error handler: case None => f(do whatever you want or need to do)
  • The returned value(s) and error handler are encapsulate into the same entity, Option class.
However, there is no "free lunch" and I was curious to find out whether the benefits of using Option type comes with a performance cost. Let's compare the relative performance of the Option type, exception handling and basic error code on a very simple example.

Note: For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted 

Evaluation

I selected the division of double precision floating point values as our simple test. The simple test code, below is compiled and run with Scala 2.10.2 and Java JDK 1.7._45 on 64-bit Windows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
     // Handling divide by zero using error code NaN
def divErrorCode(x : Double, y : Double) : Double =
  if( Math.abs(y) < 1e-10) Double.NaN else x/y

       // Handling divide by zero using Arithmetic Exception
def divException(x : Double, y : Double) : Double = {
  if( Math.abs(y) < 1e-10) 
        throw  new ArithmeticException("Cannot divide by 0")
  x/y
}

     // Handling divide by zero using Option[Double] return type
def divOption(x : Double, y : Double) : Option[Double] =
     if( Math.abs(y) < 1e-10) None else Some(x/y)


The source code for the three handling errors follows:

1
2
3
4
5
6
7
8
9
  // Option error handling
divOption(x,y).getOrElse(-1.0)

  // Exception error handling 
Try ( divException(x,y) ).getOrElse(-1.0)

  // Return value error handling
val result = divErrorCode(x,y)
if( result.isNaN) -1.0 else result

Positive test

The test consists of running each of those local functions through a large number of iteration varying from 2,000,000 to 18,000,000. The graph that summarizes the test is defined below


As expected, the performance of each of those 3 error handling mechanisms degrades linearly according to the number of iterations. Clearly, the option error handling has the best performance and the exception has the highest overhead. incurs the lowest performance while the exception handler is by far the most efficient.

Negative test 

We run the same test with the number of iterations varying from 200,000 to 1,800,000, but with an arithmetic error at each iteration.



The exception handling mechanism has by far the highest overhead. The option monad and the returned error code mechanism have very similar performance.
Performance is only one of the elements to consider when selecting the most appropriate error handling mechanism. However, all things being equal, the overhead generated by repeatedly throwing exception (i.e. lengthy iteration or recursion) should be an incentive to consider alternative solutions

Note: The evaluation of the error handling mechanism has been performed using Scala 2.9. Results may vary in future releases. 

References

Monday, August 12, 2013

Performance of Scala iterators

Target audience: Beginner

Objective 

The Scala programming language provides software developers with several options to iterate through the elements of a collection:
  • for,while loops and foreach ( x => f(x)) higher order function.
  • map[Y] { f: X => Y) : Collection[Y] that creates a new collection by applying a function f to each element of the collection
  • foldLeft[Y] (y : Y)(f : (Y,X)=>Y) : Y) (resp. foldRight[Y] (y : Y)(f : (X,Y)=>Y) : Y) that applies a binary operator to each element of this collection starting left to right (resp. right to left)

This post attempts to quantify the overhead of the most commonly used iterative methods in Scala and demonstrate the effectiveness of the higher order methods map and foldLeft.

Scala loops for summation

The test runs are executed on a 'plain vanilla' dual core i3 2.1 Ghz running Linux CentOs 6.0. The first test consists of comparing compare the performance of the different options to traverse an array of Float with size varies from 2,000,000 to 40,000,000 elements then apply an operation += z to each of its members. The options are 
  foreach   (line 6)
  for loop  (line 9)
  while loop (lines 14 - 16)
  foldLeft   (line 19)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
val rGen = new scala.util.Random(System.currentTimeMillis)
var data = Array.fill(size)(rGen.nextFloat)
var sum = 0.0

  // Higher order method
data.foreach(sum += _)

  // for loop
for( x <- data) sum += x

  // while loop
var k = 0
val len = data.size
while( k < len) {
  sum += data(k)
  k += 1
}
   // fold
sum = data.foldLeft(0.0)((x, z) => x + z)

The test is repeated 25 times in order to reduce variance and noise generated by the garbage collector. The first 5 iterations are discarded to avoid the overhead of the initialization of the JVM. The mean value of the execution time for each method is computed for different size of an array from 2,000,000 to 40,000,000 floating point values (type Float). The results of the test are plotted in the graph below. The unit of time on the Y-coordinate is milliseconds.

The for, while and foreach expression have very similar performance.
The foldLeft is significantly faster (ratio 1:6)


Data transformation

The second test consists of comparing the performance of
  • foreach: "fills-up" iteratively a mutable array of type ArrayBuffer (line 3)
  • foreach: creates and updates a copy of the original array (immutable approach)(lines 7 & 8)
  • map: transform the original array into an array of square values (line 11)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
   // foreach with mutable array buffer
val newData = new mutable.ArrayBuffer[Float]
data.foreach( (x: Float) => newData.append(x *x))
val result = newData.toArray

  // foreach with update of immmutable array
val pData = Array.fill(sz)(0.0)
data.foreach( pData.update(i, _) )

  // map
val nData = data.map((x:Float) => x*x)

Let's run the same test (with the same setup defined in the previous section).



The test shows that the method dedicated to convert an array to other array by applying a natural transformation, map is by far the most efficient.
The methods dedicated to a specific task such as foldLeft for summation and map for data transformation are far more effective that the "plain vanilla" loop constructors. The tests are conducted with Scala 2.10.2


Important Notes:
The syntax or construct for has a very different meaning in Scala as in C or Java. It is actually a wrapper or syntactic sugar layer around the monadic chain of flatMap and map transformation as follows

  for (
      a <- f(x)  // flatMap
      b <- g(a)  // flatMap
      c <- h(b)  // map
  ) yield { }
A more elaborate and time consuming benchmark would consist of running multiple tests using several boolean (< !=..) and numeric (+, *, x => sin(x) ..) operators and computes the normalize mean and variance.

References

Saturday, July 6, 2013

Scala's Share-Nothing actors

Target audience: Advanced
Estimated reading time: 6'

Introduction

Even with the introduction of executor service and java.util.concurrent high level of abstraction in Java 1.5, programmers have found quite difficult to build reliable multi-threaded applications that shared data and locks It is quite common for less experienced developers to either over-synchronize data access and create deadlocks or allow race conditions and transition the application to an inconsistent state.

Scala's actors is a share-nothing, message passing model. At its core, an actor is a 'thread' with a mailbox to receive and respond to messages. Actors are sub-classes of scala.actors.Actor. The two main methods create an actor are:
  • act: implements the co-routine that correspond to the execution of the thread, similar to Thread.run() in Java.

  • react: process the messages sent by other actors and queued in the mailbox. The method react does not return (non blocking) when receiving and processing a message or request. There are two approach to exit a processing of messages: call exit or call act again with an exit condition being true
Note: The implementation described in this post relies on Scala 2.0 and is not guarantee to compile and execute as expected in the future version of the language. 

Workers ...

The example below describes a master actor (managing task) that creates and manages slave actors (or worker tasks). In order to avoid race condition and adding a lock, the reference newParent to master actor is sent to each slave actor (line 10) through the message passing mechanism react (lines 9 - 13). 
The slave Actor class implements a task for numIters executions of a specific process (line 15). The only way to exit the react loop is to call once again and exit on the condition parent != null (line 9). The computation method process to be executed by those slaves is an attribute of the slave (line 4).
Finally, the slave actor sends a message to its parent that its task is completed (line 17).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
class WorkerActor(
  numIters: Int = 25000, 
  message: String,
  process: (Int) =>(Double)) extends Actor {
 
  def act {
    var parent: Actor = null
    while( parent == null) {
      react {
        case (msg: String, newParent: Actor) => 
           parent = newParent
        act()
      }
    }
    process(numIters)   

    parent ! "DONE"
  }
}


... and master

The Master task or actor is responsible to launches then control slave actors. Once a worker actor is completed, it notifies the master through a message 'DONE' (line 15). The master actor starts all the worker tasks (line 7) and sends a non-blocking message, Activate (line 8).
Upon receiving the message DONE (line 15), the master actor decrements the reference count of the worker actor currently active as soon as one completes its execution (line 16). The master ultimately exits when the last worker completes its task and ultimately exits (
reference counter == 0) (line 17).


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class MasterActor(
  slaveActors: List[WorkerActor]) extends Actor {
    
  def act() {
        
   for( workerActor <- workerActors) {
     workerActor.start
     workerActor ! ("Activate", this)
   }
        
   var refCounter = workerActors.size-1
   loop  {
      react {       
    
        case "DONE" => {
          refCounter -= 1
          if(refCounter == 0) 
             exit
        }
        case _ =>  { println("Incorrect message") }
      }
   }
  }
}

The main routine, ActorsTest.main, creates the worker which are launched by the master actor that acts as the managing task. The worker tasks execute a local function, waveSum, defined in real-time. This approach is an alternative to the most traditional functional futures.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
object ActorsTest extends App {
   val nOfWorkers = 10
   val numIters = 1250000
   val eps = 0.0001
      
   // Arbitrary method to simulate load on the CPU cores
   def waveSum(numIters: Int): Double =
     (0 until numIters)./:(0.0)(
       (s,i)=> s+Math.exp(Math.sin(i*eps)
      ) 
         
      // Create the worker tasks, then ...
   val workers = (0 until nOfWorker)./:(List[WorkerActor]())(
    (xs, i)=> new WorkerActor(numIters, i.toString, waveSum) :: xs
   )
   new MasterActor(sworkers).start
  }
}


References