Target audience: Beginner
Estimated reading time: 3'
Overview
It is not unusual that Scala developers struggle in re-conciliating elegant functional programming style with efficient and fast execution. High order collection methods are conducive to very expressive constructs at the cost of poor performance. the zip method is no exception.
def GenIterable.zip[B](that: GenIterable[B]): CC[(A, B)]
Fortunately, the authors of the Scala library have been diligent enough to provide us with an alternative in the case of the array of pairs (type Tuple2).
Note: For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted
scala.Tuple2.zipped
Contrary to the GenIterable.zip method, Tuple2.zipped is a method unique and customized to the class Tuple2.
Let's evaluate the performance of the zipped relative to the ubiquitous zip. To this purpose, let's create a benchmark class to access elements on a zipped array. The first step is to create a function to access the first and last element for zipped arrays.
The first method zip exercises the GenIterable.zip method (line 2). The second method tzipped wraps the Tuple2.zip method (line 7).
The first method zip exercises the GenIterable.zip method (line 2). The second method tzipped wraps the Tuple2.zip method (line 7).
1
2
3
4
5
6
7
8
9 | def tzip(x: Array[Double], y: Array[Double]): (Double, Double) = {
val z = x.zip(y)
(z.head._1, z.last._2)
}
def tzipped(x: Array[Double], y: Array[Double]): (Double, Double) = {
val z = (x, y).zipped
(z.head._1, z.last._2)
}
|
Next we need to create a method that executes the two wrappers _zip and _zipped and measure the duration of their execution. We arbitrary select to sum the first element and the product of the last element of the zipped array.
The function zipTest has three arguments
- A dataset of type Array[Double] x (line 2)
- A second dataset y (line 3)
- A function argument f for the wrapper methods tzip and tzipped (line 4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 | def zipTest(
x: Array[Double],
y: Array[Double],
f: (Array[Double], Array[Double]) => (Double, Double)
): Unit = {
var startTime = System.currentTimeMillis
var sum = 0.0
var prod = 1.0
Range(0, 50).foreach( _ => {
val res = f(x, y)
sum += res._1
prod *= res._2
})
println(s"sum=$sum prod=$prod in ${(System.currentTimeMillis - startTime)}")
|
The last step is to invoke the benchmark method zipTest for different length of arrays. The element of the arrays are random floating point values..
1
2
3
4
5
6 | def zipTest(len: Int): Unit = {
val x = Array.tabulate(len)(_ => Random.nextDouble)
val y = x.clone
zipTest(x, y, _zip)
zipTest(x, y, _zipped)
}
|
Performance Results
The test is performed on a single 8-core i7 32-Gbyte host.
val step = 1000000
val initial = step
Range(0, 6).foreach(n => zipTest(initial + n*step) )
The results shows that the performance of Array.zip decreases exponentially compared to Tuple2.zipped which has a linear degradation. For 50 iterations on 1 million element arrays, Tuple2.zipped is 17% faster than Array.zip but 280% faster for 8 million elements.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.