Friday, March 16, 2018

Fisher-Yates Shuffle in Scala

Target audience: Intermediate
Estimated reading time: 4'


This post describes the Fisher-Yates algorithm to shuffle randomly a sequence of objects.
Follow me on LinkedIn

Note: The code associated with this article is written using Scala 2.12.4

Overview

The generation of a sequence of randomly ordered elements from an existing sequence is a common problem. How can you be sure if the new sequence of items is actually randomly generated?
The Fisher-Yates shuffle algorithm (a.k.a. Donald Knuth shuffle) produces unbiased permutations with a similar likelihood.
One important benefit of Fisher-Yates is the ability to shuffle the elements of the sequence, in place.


Note:
There are several interpretations of the shuffling algorithm. This post describes the implementation of the most common Fisher-Yates algorithm


Fisher-Yates

Here is the Fisher-Yates in a nutshell. Let's consider an array arr of n elements
For each array index k from n-1 to 1
    Select a random value m  between k and 0
    Swap arr[r] and arr[k]
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def fisherYates[T](t: Seq[T]): Seq[T] = {

   setSeed(System.currentTimeMillis)
  
   (0 until t.size).foreach (n => {
       val randomIdx = n + nextInt(t.size-n)
       val tmp = t(randomIdx)
       t.update(randomIdx, t(n))
       t(n) = tmp
   })
   t
} 

The Fisher-Yates algorithm executes as an iteration with the following two steps:
  • Select a random item in the remaining sequence (line 6)
  • Swap the randomly selected item with the current item (lines 8,9)

Fast Fisher-Yates implementation

The performance of the Fisher-Yates algorithm can be improved by implementing the swapping of the randomly selected item with the current time in-place using bit manipulation operators.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def fisherYates2(seq: mutable.Seq[Int]): IndexedSeq[Int] = {
  require(seq.size > 0, "Undefined argument")

  setSeed(System.currentTimeMillis)
  (0 until seq.size).map(i => {
     var randomIdx: Int = i + nextInt(seq.size-i)
     seq(randomIdx) ^= seq(i)
     seq(i) = seq(randomIdx) ^ seq(i) 
     seq(randomIdx) ^= (seq(i))
     seq(i)
  })
}

The following code snippet tests the two implementations of the Fisher-Yates shuffling algorithm.
  val input = Array[Char](
    'a', 't', '4', '&', '9', '2', 'z', 'y', 'r', '8', '*', '?'
  )
  println(fisherYates[Char](input).mkString(","))
  // t,&,*,2,a,r,4,z,?,y,8,9

  println(fisherYates2(Array.tabulate(34)(n => n)).mkString(","))
  // 9,24,11,29,1,31,28,20,6,18,23,0,13,17,22,0,27,
  // 4,16,7,25,14,30,32,5,12,8,19,21,0,0,33,26,0
Note: Scala standard library provides developers with a shuffling method: Random.shuffle.

Thank you for reading this article. For more information ...

References


---------------------------
Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning. 
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning" Packt Publishing ISBN 978-1-78712-238-3