Showing posts with label scan. Show all posts
Showing posts with label scan. Show all posts

Sunday, November 1, 2020

Evaluate Performance of Scala Tail Recursion

Target audience: Intermediate
Estimated reading time: 3'

Recursion refers to the technique where a function invokes itself, either directly or indirectly, and such a function is termed a recursive function. 
Some problems can be more effortlessly addressed using recursive algorithms. In this article, we will assess the performance of Scala's tail recursion in comparison to iterative approaches.


Table of contents
Follow me on LinkedIn
   

Overview

In Scala, the tail recursion is a commonly used technique to apply a transformation to the elements of a collection. The purpose of this post is to evaluate the performance degradation of the tail recursion comparatively to iterative based methods.
For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted.

Test benchmark

Let's consider a "recursive" data transformation on an array using a sliding window. For the sake of simplicity, we create a simple polynomial transform on a array of values
   {X0, ... ,Xn, ... Xp}
with a window w, defined as
   f(Xn) = (n-1)Xn-1 + (n-2)Xn-2 + ... + (n-w)Xn-w.  

Such algorithms are widely used in signal processing and technical analysis of financial markets (i.e. moving average, filters).

def polynomial(values: Array[Int]): Int = 
  (if(values.size < W_SIZE) 
     values 
  else 
     values.takeRight(W_SIZE)
  ).sum


The first implementation of the polynomial transform is a tail recursion on each element Xn of the array. The transform f compute f (values(cursor)) from the array values[0, ... , cursor-1] as describe in the code snippet below

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Evaluation(values: Array[Int]) {
  def recurse(f: Array[Int] => Int): Array[Int] = {

    @scala.annotation.tailrec
    def recurse(
      f: Array[Int] => Int, 
      cursor: Int, 
      results: Array[Int]): Boolean = {  
        
      if( cursor >= values.size) // exit condition
        true
      else {
        val arr = f(values.slice(cursor+1, cursor-W_SIZE))
        results.update(cursor, arr)
        recurse(f, cursor+1, results)
      }
    }

    val results = new Array[Int](values.size)
    recurse(f, 0, results)
    results
  }
}

The second implementation relies on the scanLeft method that return a cumulative of transformed value f(Xn).

def scan(f: Array[Int] => Int): Array[Int] = 
   values.zipWithIndex.scanLeft(0)((sum, vn) => 
         f(values.slice(vn._2+1, vn._2-W_SIZE))
  )

Finally, we implement the polynomial transform on the sliding array window with a map method.

def map(f: Array[Int] => Int): Array[Int] = 
   values.zipWithIndex.map(vn =>  f(values.slice(vn._2+1, vn._2-W_SIZE)))


Performance evaluation

For the test, each of those 3 methods is executed 1000 on a dual core i7 with 8 Gbyte RAM and MacOS X Mountain Lion 10.8. The first test consists of executing the 3 methods and varying the size of the array from 10 to 90. The test is repeated 5 times and the duration is measured in milliseconds.



The tail recursion is significantly faster than the two other methods. The scan methods (scan, scanLeft, scanRight) have significant overhead that cannot be "amortized" over a small array. It is worth noticing that the performance of map and scan are similar. The relative performance of those 3 methods is confirmed while testing with large size array (from 1,000,000 to 9,000,000 items).



Thank you for reading this article. For more information ...

References


---------------------------
Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning. 
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning" Packt Publishing ISBN 978-1-78712-238-3

Sunday, November 3, 2013

Breakable Loops in Scala

Target audience: Beginner
Estimated reading time: 4'



Table of contents
Follow me on LinkedIn

Introduction

Contrary to C++ and Java, Scala does not allow developer to exit an iterative execution prematurely using a syntax equivalent to break.

var sum = 0

for( i <- 0 until 100) {
  sum += i
  if( sum > 400) 
    break   // won't compile!                                  
}

Scala purists tend to stay away from this type of constructs and use higher order collection methods such as
  exists( p: (T) => Boolean)
   find( p: (T) => Boolean)
takeWhile( p: (T) => Boolean
)
However these methods are not available outside collections. There are cases where a `break` construct may be a simpler solution.
This post review the different options to 'break' from a loop according to a predefined condition.

Breakable control

Although breaking out of a loop may not be appealing to "pure" functional developer, the Scala language provides formal Java and C++ developers with the option to use break and continue statements. The breakable statement define the scope for which break is valid.

import scala.util.control.Breaks._                            

var sum = 0
breakable { 
  for (i <- 0 until 100 ) {
    sum += i
    if( sum > 55) 
      break
  }
}

Any experienced developer would be quite reluctant to use such an idiom: beside the fact that the accumulator is defined as a variable, the implementation is unnecessary convoluted adding a wrapper around the loop. Let's try to find a more functional like approach to breaking out of any loop.

scan & fold to the rescue

Luckily, Scala provides collections with functional traversal patterns that can be used and chained to break, elegantly from a loop. The following code snippets introduce applies some of those traversal patterns to an array and a associative map to illustrate the overall "functional" approach to iteration. 
Let's consider the problem of extracting the elements of an array or map before a predefined condition is met for an element. For instance, let's extract the elements of an array or a hash map until one element has the value 0. The Scala programming language provide us with the takeWhile method (lines 7 & 10) that allows to to end traversing a collection and return a elements of the collection visited when a condition is met.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
val values = Array[Int](9, 45, 11, 0, 67, 33)

val mappedValues = Map[String, Int](
  "A"->2, "B"->45, "C"->67, "D"->0, "E"->56
)
      // Extract a subarray of values until the condition is met
values takeWhile( _ != 0) // -> Array(9, 45, 11)
 
     // Extract a submap until the condition is met
mappedValues takeWhile( _._2 != 0) 
     // -> Map(E -> 56, A -> 2, B -> 45, C -> 67)

The second case consists in accumulating values until a condition on the accumulator is met. Contrary to fold and reduce methods which apply a function (i.e summation) to all the elements of a collection to return a single value, scan, scanLeft and scanRight methods return a collection of values processed by the function.

In the following example, the invocation of the higher order method scanLeft (lines 4 generates an array and a hash map with the cumulative values. The takeWhile method i(lines 5 & 8) s then applied to the resulting array or map to return a collection of cumulative values until the accumulator exceeds 56. The same invocation can be used to return the element which push the accumulator beyond the threshold.

1
2
3
4
5
6
7
8
9
values.scanLeft(0)(_ + _).takeWhile (_ < 56) 
   //-> Array(0, 9, 54)
 
mappedValues.scanLeft(0) (_ + _._2)
   .takeWhile( _ < 56)
 
val indexTargetEl = values.scanLeft(0)(_ + _)
      .takeWhile (_ < 56).size -1            
val targetEl = values(indexTargetEl)



Tail recursion

A third and elegant alternative to exit from a loop is using the tail recursion which is supported natively in the Scala language. The first method, findValue exits the loop when a condition on the value is reached (line 5). The second method, findCummulative, is implemented as a closure and exits when the sum of elements exceeds a predefined value (line 15).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
val values = Array[Int](9, 45, 11, 0, 67, 33)
 
@scala.annotation.tailrec
def findValue(values: Array[Int], index: Int) : Int = {
   if( values(index) == 0 || index >= values.size)
     index
   else
     findValue(values, index + 1)
}
val newValues = values.slice(0, findValue(values, 0))   

val MAX_VALUE :Int = 86

@scala.annotation.tailrec
def findCumulative(sum: Int, index: Int): Int = {
   if( index >= values.size ||sum >= MAX_VALUE)
     index -1
   else
     findCumulative(sum + values(index), index + 1)
}

val newValues2 = values.slice(0, findCumulative(0, 0))

This concludes our review of three different Scala constructs available to developer to exit prematurely from a loop
  • breakable construct
  • Higher order method such as exists, find, takeWhile....
  • tail recursion

References