Wednesday, April 3, 2019

Genetic Algorithms II: Operators

Target audience: Advanced
Estimated reading time: 6'

In the realms of science and engineering, genetic algorithms serve dual purposes: as adaptable algorithms addressing real-world challenges and as computational representations of natural evolutionary mechanisms. 

This article stands as the subsequent chapter in our series on genetic algorithms. Within, we detail the Scala-based implementation of genetic operators, encompassing selection, cross-over, and mutation, acting upon a chromosome population.

Table of contents
Follow me on LinkedIn
 

Introduction

In the first article on genetic algorithms, you learned about the key elements of genetic algorithms.Genetic Algorithms I: Foundation
  • Chromosomes
  • Genes
  • Quantization
  • Population
This second part introduces the concept and implements of genetic operations (cross-over, mutation and selection) on a population of chromosomes. These operators are applied recursively on each chromosome and each of the genes it contains.
The 3rd and final post on genetic algorithms, explores the application of genetic algorithm as a solver or optimizer Genetic Algorithms III: Solver

Note: For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted


Selection

The first genetic operator of the reproduction cycle is the selection process. The select method of the class Population implements the steps of the selection of the fittest chromosomes in the population in the most efficient manner, as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def select(score: Chromosome[T] => Unit, cutOff: Double) = {
   val cumul = chromosomes./:(0.0)(
      (s,x) =>{ score(xy); s + xy. unfitness} 
   ) 

   chromosomes foreach( _ /= cumul) 
   val newChromosomes = chromosomes.sortWith(_.unfitness < _.unfitness)
   val cutOffSize = (cutOff*newChromosomes.size).floor.toInt
   val newPopSize = if(limit<cutOffSize) limit else cutOffSize

   chromosomes.clear
   chromosomes ++= newChromosomes.take(newPopSize)
}

The select method computes the cumulative sum of an unfitness value, cumul, for the entire population (lines 2 -3). It normalizes the unfitness of each chromosome (line 6), orders the population by decreasing value (line 7), and applies a soft limit function on population growth, cutOff (line 8). The last step reduces the size of the population to the lowest of the two limits: the hard limit, limit, or the soft limit,cutOffSize (line 9).

Cross-over

There are several options to select pairs of chromosomes for crossover. This implementation ranks the chromosomes by their fitness value and then divides the population into two halves. Finally, it pairs the chromosomes of identical rank from each half as illustrated in the following diagram: 


Population cross-over
The crossover implementation, +- , selects the parent chromosome candidates for crossover using the pairing scheme described earlier.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def +- (xOver: Double): Unit = {
 if( size > 1) {
   val mid = size>>1
   val bottom = chromosomes.slice(mid, size) 
   
   val gIdx = geneticIndices(xOver)
   val offSprings = chromosomes.take(mid)
      .zip(bottom)
      .map(p => p._1 +-(p._2, gIdx))
      .unzip

   chromosomes ++= offSprings._1 ++ offSprings._2
 }
}

The implementation of the cross-over on the population of chromosomes ranked by their unfitness consists of
  • Get the mid point of the list of ranked chromosomes (line 3)
  • Get the least fit half of the chromosome (line 4)
  • Retrieve the position of the bit in the chromosome the cross-over applies (line 6)
  • Retrieve the two offspring by crossing over pairs of chromosomes from each half of the ranked population (lines 7 - 10)
  • Add the two off-springs to the current population (line 12)

Chromosome cross-over
The implementation of the crossover for a pair of chromosomes using hierarchical encoding follows two steps:
  • Find the gene on each chromosome that corresponds to the crossover index, gIdx.chOpIdx, and then swap the remaining genes (line 6)
  • Split and splice the gene crossover at xoverIdx (lines 8 & 12)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
def +-(
   that: Chromosome[T], 
   gIdx: GeneticIndices
): (Chromosome[T], Chromosome[T]) = {
 
    val xoverIdx = gIdx.chOpIdx
    val xGenes = spliceGene(gIdx, that.code(xoverIdx) )
 
    val offSprng1 = code.slice(0, xoverIdx) ::: xGenes._1 
        :: that.code.drop(xoverIdx+1)

    val offSprng2 = that.code.slice(0, xoverIdx) ::: xGenes._2 
        :: code.drop(xoverIdx+1)
 
   (Chromosome[T](offSprng1), Chromosome[T](offSprng2)
}

The crossover method computes the index of the bit that defies the crossover xoverIdx in each parent chromosome. The genes code(xoverIdx) and that.code(xoverIdx) are swapped and spliced by the spliceGene method to generate a spliced gene (lines 9 - 13).

The method spliceGene is implemented below.

 
def spliceGene(gIdx: GeneticIndices, thatCode: T): (T, T) = {
     ((this.code(gIdx.chOpIdx) +- (thatCode, gIdx)),
      (thatCode +- (code(gIdx.chOpIdx), gIdx)) )
}

Gene cross-over
The crossover is applied to a gene through the +- method of the Gene class. The exchange of bits between the two genes this and that uses the BitSet Java class to rearrange the bits after the permutation (lines 4 - 6). The bit string is then decoded to produce the predicate or logical representation of the gene (line 8).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def +- (that: Gene, idx: GeneticIndices): Gene = {
    val clonedBits = cloneBits(bits)
  
    Range(gIdx.geneOpIdx, bits.size).foreach(n =>
        if( that.bits.get(n) ) clonedBits.set(n)
        else clonedBits.clear(n)
    ) 
    val valOp = decode(clonedBits)

    Gene(id, valOp._1, valOp._2)
}

Mutation 

Population mutation
The mutation operator ^ invokes the same operator for all the chromosomes in the population and then adds the mutated chromosomes to the existing population, so that they can compete with the original chromosomes. 
The mutation parameter mu is used to compute the absolute index of the mutating gene, geneticIndices(mu). We use the notation ^ to define the mutation operator to remind the reader that the mutation is implemented by flipping one bit:

 
def ^ (mu: Double): Unit =
     chromosomes ++= chromosomes.map(_ ^ geneticIndices(mu)) 
 

Chromosome mutation
The implementation of the mutation operator ^ on a chromosome consists of mutating the gene of the index gIdx.chOpIdx and then updating the list xs of genes in the chromosome. The method returns a new chromosome with this new generic code that is added to the population.

def ^ (gIdx: GeneticIndices): Chromosome[T] = {
    val mutated = code(gIdx.chOpIdx) ^ gIdx

    val xs = Range(0, code.size).map(
        i => if(i==gIdx.chOpIdx) mutated else code(i)
    ).toList

    Chromosome[T](xs)
}

Gene mutation
Finally, the mutation operator flips (XOR) the bit at the index gIdx.geneOpIdx
The ^ method mutates the cloned bit string, clonedBits (line 2) by flipping the bit at the index gIdx.geneOpIdx (line 3). It decodes line 5) and converts the mutated bit string by converting it into a (target value, operator) tuple (line 7). The last step creates a new gene from the target-operator tuple.

1
2
3
4
5
6
7
8
def ^ (gIdx: GeneticIndices): Gene = {
     val clonedBits = cloneBits(bits)
     clonedBits.flip(gIdx.geneOpIdx)

     val valOp = decode(clonedBits)

     Gene(id, valOp._1, valOp._2)
}

This concludes the second part of the implementation of genetic algorithms in Scala, dedicated to genetic operators.

Thank you for reading this article. For more information ...

References


---------------------------
Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning. 
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning" Packt Publishing ISBN 978-1-78712-238-3

Monday, March 11, 2019

Genetic Algorithm I: Foundation

Target audience: Advanced
Estimated reading time: 6'

Drawing inspiration from Charles Darwin's theory of natural evolution, a genetic algorithm is a search heuristic that mirrors the principles of natural selection. Here, the most suitable individuals are chosen for reproduction to generate the subsequent generation.
In this piece, we will explore the foundational concepts of Genetic Algorithms and delve into their implementation using Scala.


Table of contents
References
Follow me on LinkedIn
 

Introduction

Scala is a type checked, object oriented and functional programming language built on top of Java Virtual Machine. We can leverage Scala's lambdas, partial functions and closures to implement the computational workflow of evolutionary  algorithm.
The second part of the implementation of genetic algorithm in Scala introduces the concept and implementation of genetic operations (cross-over, mutation and selection) on a population of chromosomes. Genetic Algorithms II: Operators
The 3rd and final post on genetic algorithms, explores the application of genetic algorithm as a solver or optimizer Genetic Algorithms III: Solver


Note: For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted

Theory

Genetic Algorithms have been invented by John Holland in the 1970s and derived their properties from the theory of evolution of Darwin. A living organism consists of cells which are composed of identical chromosomes. Chromosomes are strings of DNA and serves as a model for the whole organism. A chromosome consist of genes that are blocks of DNA and encodes a specific protein.
Recombination (or crossover) is the first stage of reproduction. Genes from parents generate the whole new chromosome (offspring) that can be mutated. During mutation, one or more elements, also known are individuals of the DNA strand or chromosome is changed. This changes are mainly caused by errors in copying genes from parents. The success of the organism in its life measures its fitness.
In computer science, Genetic Algorithms are a way of solving problems by mimicking nature. They use the same combination of selection, recombination and mutation to evolve a set of candidates for resolving a given problem. The basic computational sequence is
  1. Select the initial population (search space) as a set of chromosones which represent candidates for solving a problem.
  2. Encode the chromosones into a vector of real values (continuous process)  X(x1,..,xn) or string of bits (00101110101111010111).
  3. Evaluate or rank the chromosones using a fitness function. 
  4. Select the fittest chromosones (which meet the minimum fitness criteria) for reproduction.
  5. Pair chromosones for cross-over. The process consists of applying a cross-over rate to interchange fragment or section of the "parent" chromosones from a random point in the encoded string or vector. 
  6. Mutate chromosomes by flipping one or more of their elements(bits) using a randomly generated index bounded by a mutation rate.
  7. Iterate the reproduction cycle (steps 2 to 6) for the current population.
Each of the genetic operators (selection, cross-over and mutation) relies on a parameter:
  • Selection rate is the random  threshold value to reduce the current population of chromosones according to their fitness
  • Crossover rate is used to compute the index beyond with the elements or bits of two parents chromosones are exchange.
    The following parents 
    10001001110010010 1010001001000011 will generate the following offsprings 10001001110010011 and 1010001001000010
  • Mutation Rate is used to compute the index of the element(s) or bit(s) in a chromosome that is/are mutated (or flipped).10001001110010010 becomes 10001001110000010

Chromosomes

The first step is to implement the key components of a genetic algorithm The implementation has to be generic enough to support any kind of fitness functions and encoding scheme.
A chromosomes is composed of one or several genes, each representing a predicate, axiom, fact or a rule/policy.
A Chromosone class wraps a list of parameterized genes, code (genetic code) (line 1). The most important methods related to chromosomes are
  • +- for the cross-over between two chromosomes (line 6)
  • ^ for the mutation of each chromosome (line 11)
  • /= for the normalization of the fitness value of each chromosome during the selection process (line 14)
This implementation uses the unfitness value of the chromosome for ranking, instead the usual fitness value. It is defined as 1- normalized_fitness (line 3).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
final class Chromosome[T <: Gene](val code: List[T]) {  

   var unfitness: Double = 1000*(1.0 + Random.nextDouble)

    // cross-over
   def +- (
       that: Chromosome[T], 
       gIdx: GeneticIndices): (Chromosome[T], Chromosome[T])= {}

   // mutation
   def ^ (gIdx: GeneticIndices): Chromosome[T] = {}

   // normalization of fitness
   def /= (normalizeFactor: Double): Unit = {}
  ...
}

The class GeneticIndices converts the cross-over and mutation probability (or rate) into indices (or position) on the bits of the chromosome or gene the cross-over and mutation is applied. The parameter chOpIdx defines the index along the bit string of a chromosome and geneOpIdx the index along the bit string of a gene

 
case class GeneticIndices(chOpIdx: Int, geneOpIdx: Int) 
 

The generic indices are generated from the cross-over or mutation rate (probability) by the method geneticIndices (line 1). The index of the bit for genetic operations on the chromosome is defined as chIdx (ratio over the length of the chromosome) (lines 5 - 8). Likewise, the index or position of the bit genetic operations on the genes is defined as gIdx (line 11)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def geneticIndices(prob: Double): GeneticIndices = {
     var idx = (prob*chromosomeSize).floor.toInt

     val chIdx = 
        if(idx == chromosomeSize) 
           chromosomeSize-1 
        else 
           idx

     idx = (prob*geneSize).floor.toInt 
     val gIdx = if(idx == geneSize) geneSize-1 else idx

     GeneticIndices(chIdx, gIdx) 
}


Genes

The class Gene wraps the genetic code associated to a predicate or a rule and takes four parameters:
  • id: This is the identifier of the gene. It is usually the name of the variable represented by the gene (line 2).
  • target: This is the target value or threshold to be converted or discretized into a bit string (line 3).
  • op: This is the operator that is applied to the target value (line 4).
  • discr: This is the discretization (or quantization) scheme that converts a numerical (real) value to an integer which is then converted into bits. It also supports the conversion of the bits string back to a numerical value.
The genes as subjected to the same genetic operations as the chromosomes: cross-over (line 8) and mutation (line 9). The conversion from bits string back to numerical value is implemented by the method decode (line 11).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class Gene(
    val id: String, 
    target: Double, 
    op: Operator)(implicit discr: Discretization) {

   val bits: Bitset

   def +- (that: Gene, gIdx: GeneticIndices): Gene = {}
   def ^ (gIdx: GeneticIndices): Gene = ^ (gIdx.geneOpIdx)
   def ^ (idx: Int): Gene = {}
   def decode(bits: BitSet): (Double, Operator)  { }
}


Quantization

The class Discretization has two arguments
  • A function, toInt, converts a real value to an integer (line 2)
  • A reverse function toDouble converts the integer back to a real value (line 3)
The main benefit to encapsulate the two conversions into a single class is to reduce the risk of inconsistency and inaccuracy between these two conversions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
class Discretization(
   toInt: Double => Int, 
   toDouble: Int => Double) {
 
   def this(R: Int) = this(
      (x: Double) => (x*R).floor.toInt, (n: Int) => n/R
   )
}

def decode(bits: BitSet): (Double, Operator) = 
      (discr.toDouble(convert(geneBits.rValue, bits)), 
       op(convert(geneBits.rOp, bits))
)

The instantiation of Gene (lines 5-6) converts a predicate into a bit string of type java.util.BitSet. The bit string is decoded by the decode method of the Gene companion object (lines 10 - 12).


Genetic population

The Population class takes two parameters:
  • limit: This is the maximum size of the population (line 4)
  • chromosomes: This is the pool of chromosomes defining the current population (line 5).
A reproduction cycle executes the following sequence of three genetic operators on a population:
select for the selection of the fittest chromosomes of the population (line 8)
+- for the pair-wise crossover of all the chromosomes (line 1)
^ for the mutation of each chromosome (line 12).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
type Pool[T <: Gene] = ArrayBuffer[Chromosome[T]]

class Population[T <: Gene](
    limit: Int, 
    val chromosomes: Pool[T]) {

  // selection operator
  def select(score: Chromosome[T] => Unit, cutOff: Double)
  // cross-over operator
  def +- (xOver: Double)
  // mutation operator
  def ^ (mu: Double)
  // ...
}

This completes the first part of the implementation of genetic algorithms in Scala. The next post dives into the details in the implementation of the genetic operators.

Thank you for reading this article. For more information ...

References

Saturday, January 5, 2019

Versioning Scala Classes

Target audience: Beginner
Estimated reading time: 4'

Versioning stands as a prevalent hurdle in the commercial software development landscape. In Java or Scala, the class loader is frequently leveraged to accommodate multiple iterations of a library, executable, or framework.
In this article, we introduce a straightforward versioning approach employing a class loader and a jar file pinpointed through its URL.


Follow me on LinkedIn

Note: The code associated with this article is written using Scala 2.11.8


Simple implementation

A library can be easily versioned by creating multiple jar files, one for each version. Let's select a simple case for each a library is self-contained in a jar file with two versions
  • ArrayIndex1.jar
  • ArrayIndex2.jar
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import java.net.{URL, URLClassLoader}
import java.io.File
import java.lang.reflect.Method

val cl1 = new URLClassLoader( 
    Array[URL](new File("ArrayIndex1.jar").toURL),   
    Thread.currentThread().getContextClassLoader()
)
 
val cl2 = new URLClassLoader( 
    Array[URL](new File("ArrayIndex2.jar").toURL),  
    Thread.currentThread().getContextClassLoader()
)
  
val compatibleClass: Class[_] = 
    if( version > 0.9) 
       cl1.loadClass("ArrayIndex") 
    else 
       cl2.loadClass("ArrayIndex")

val obj = compatibleClass.newInstance
val methods = compatibleClass.getMethods
println(s"${methods.map( _.getName).mkString(",")}")

The first step consists of loading the two versions of the library through their URL by converting each jar file names to a URL. This feat is accomplished by instantiating the class loaders: In this particular case, the class loader is undefined by its URL with the type URLClassLoader (line 5 & 9). The jar files are loaded within the current thread (line 7 & 11): A more efficient approach would consists of creating a future to load the classes asynchronously.
The class to be used in this case depends on the variable version (line 15). Once the appropriate class is loaded, its instance and method are ready available to the client be invoked.

Thank you for reading this article. For more information ...

References

---------------------------
Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning. 
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning" Packt Publishing ISBN 978-1-78712-238-3