Tuesday, June 25, 2013

Comparative Performance of Scala Iterators

Objective 
The Scala programming language provides software developers with several options to iterate through the elements of a collection:

  • for,while loops
  • foreach ( x => f(x)) that iterates through a collection
  • map[Y] { f: X =>) : Collection[Y] that creates a new collection by applying a function f to each element of the collection
  • foldLeft[Y] (y : Y)(f : (Y,X)=>Y) : Y) (resp. foldRight[Y] (y : Y)(f : (X,Y)=>Y) : Y) that applies a binary operator to each element of this collection starting left to right (resp. right to left)
As developer, we need to be aware of the trade-offs between the different methods in terms of performance. This post compares the relative performance of the most commonly used iterative methods in Scala.

Evaluation
The test runs are executed on a 'plain vanilla' dual core i3 2.1 Ghz running Linux CentOs 6.0. The first test consists of comparing compare the performance of the different options to traverse an arrayval data : Array[Double] with size varies from 200,000 to 1,800,000 elements then apply an operation += z to each of its members.
Array.foreach loop
val newArray = data.foreach( x => x + z)
for or while loop
for ( i <- 0 until data.length) {  data(i) += z }                         
Array.foldLeft method
val newArray = data.foldLeft(Int)( (x : Int, z : Int) => x + z)
Array.map method
val newArray = data map { x => x + z }
The test is repeated 10 times in order to reduce variance and noise generated by the garbage collector. The mean value of the execution time for each method is computed for different size of the array from 400,000 to 1,800,000 floating point values. The results of the test are plotted in the graph below. Clearly, the simple for loop has by far the shortest execution time, followed by the leftFold and foreach operation. Not surprisingly, the map method has the worst performance.



The second test consists of comparing the performance for initializing the elements of a list using the same constructs, foreach, map, leftFold and the for loop.
val newList1 = data.foreach( x => z)
val newList2 = data map { x => z}
val newList3 = data.foldLeft[Int](0) ((x : Int,z : Int) => z)
for(i <- 0 until size) { z :: data }
The performance of each relative methods for initializing a list is similar to the operation on large arrays. Once again map is the laggard while the two other functional iterator, foreach and foldLeft have similar performance as for loop.


In summary, the for loop has to be used for iterations on large collections while foreach, forall, foldLeft, and foldRight can be used for small size collections for which the convenience of functional constructs overweight the small performance penalties.

References
Scala for the Impatient - C. Horstman - Addison-Wesley 2012 
github.com/prnicolas

2 comments:

  1. your "for-loop" is a very bad example, it is no normal C loop. If you realy wanted to compare C loops against scala iterators you would have to use a while.

    like this:
    var i = 0
    while(i < data.length()){ ...; i += 1}

    ReplyDelete
  2. I have no idea what this article is supposed to prove.

    First of all, running a test only 10 times is not enough to guarantee that JIT has warmed up. But I'm going to let it pass. It's an understandable mistake.

    In the first test, the foreach loop does absolutely nothing. data.foreach( x => x + z) == () and nothing is modified, map creates a copy of the array, and for modifies the array in place. And finally, in the second test, foreach again does nothing, map does something barely sensible (creates a new list in a very contrived way) and foldLeft... returns the last element of the list.

    Of course the results are gonna vary, each of the tested methods in both cases does absolutely different things. It's not even comparing apples to oranges, it's comparing Apple Inc. to oranges.

    ReplyDelete