## Tuesday, June 25, 2013

### Comparative Performance of Scala Iterators

Objective
The Scala programming language provides software developers with several options to iterate through the elements of a collection:

• for,while loops
• foreach ( x => f(x)) that iterates through a collection
• map[Y] { f: X =>) : Collection[Y] that creates a new collection by applying a function f to each element of the collection
• foldLeft[Y] (y : Y)(f : (Y,X)=>Y) : Y) (resp. foldRight[Y] (y : Y)(f : (X,Y)=>Y) : Y) that applies a binary operator to each element of this collection starting left to right (resp. right to left)
As developer, we need to be aware of the trade-offs between the different methods in terms of performance. This post compares the relative performance of the most commonly used iterative methods in Scala.

Evaluation
The test runs are executed on a 'plain vanilla' dual core i3 2.1 Ghz running Linux CentOs 6.0. The first test consists of comparing compare the performance of the different options to traverse an arrayval data : Array[Double] with size varies from 200,000 to 1,800,000 elements then apply an operation += z to each of its members.
Array.foreach loop

 val newArray = data.foreach( x => x + z) 
for or while loop
 for ( i <- 0 until data.length) { data(i) += z } 
Array.foldLeft method
 val newArray = data.foldLeft(Int)( (x : Int, z : Int) => x + z) 
Array.map method
 val newArray = data map { x => x + z } 

The test is repeated 10 times in order to reduce variance and noise generated by the garbage collector. The mean value of the execution time for each method is computed for different size of the array from 400,000 to 1,800,000 floating point values. The results of the test are plotted in the graph below. Clearly, the simple for loop has by far the shortest execution time, followed by the leftFold and foreach operation. Not surprisingly, the map method has the worst performance.

The second test consists of comparing the performance for initializing the elements of a list using the same constructs, foreach, map, leftFold and the for loop.

 val newList1 = data.foreach( x => z) val newList2 = data map { x => z} val newList3 = data.foldLeft[Int](0) ((x : Int,z : Int) => z) for(i <- 0 until size) { z :: data } 

The performance of each relative methods for initializing a list is similar to the operation on large arrays. Once again map is the laggard while the two other functional iterator, foreach and foldLeft have similar performance as for loop.

In summary, the for loop has to be used for iterations on large collections while foreach, forall, foldLeft, and foldRight can be used for small size collections for which the convenience of functional constructs overweight the small performance penalties.

References
Scala for the Impatient - C. Horstman - Addison-Wesley 2012
github.com/prnicolas