Monday, November 12, 2012

Lazy Evaluation in C++ & Scala

Target audience: Intermediate
Estimated reading time: 3'



Overview
Lazy evaluation is a not a new concept in software engineering. The concept is rather simple: initialize a variable or an object once it is needed. A typical use case is the instantiating of a large object in memory from the content of database: the lifecycle of the object is shorter than the application so it is convenient to allocate the memory when and if needed.
Both C++ and Scala support lazy evaluation using two very different approach: C++ relies on a class constructor to initialize iteratively its attribute member of a class while Scala supports lazy initialization natively.
 

Note: For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted.


C++
Lazy evaluation or fetching can be implemented in C++ using a design pattern. Let consider a large object which state is stored in a remote database or initialized through a request REST API. In many cases, only some of the attributes of the object need to be initialized from the remote data source (table for instance): load the content of the entire table would not make much sense.

How C++ address the problem? Let's consider a class that contains all the information regarding the visit of a user in a user.
 
class SiteVisit {
private:
  int  id;
  mutable string*  url;
  mutable long  visitDate;
  mutable std::list* targets;
  mutable char[4] sources; 
  void loadFromDB(std::list*)

public:
  SiteVisit(int id) : id(id), url(null), visitDate(-1L), targets(null), sources(null) { }    
  const string& getName() const;
  long          getVisitDate() const;
  std::list&    getTargets() const;
 const char[4]& getSources();
}


It is conceivable that not all the data describing the visit has to be retrieved for analysis or reporting. For instance, there is no need to load the targeted URLs, for all the visits from the database until a report needs to be created. In this case, the role of the constructor in C++ is to initialize the unique identifier for the object (or table unique primary key) and nothing else. The getTargets method retrieves the list of pages visited at a later stage in the application lifecycle. Therefore, this list is generated only once; the first time it is invoked. The method retrieveAllVisitedPages, iterates through all the pages of the site that have been visited.

std::list<string> SiteVisit::getTargets() {
  if( targets == null) {
     targets = new std::list
     loadTargetsFromDB(targets);
  }
  return targets;
}   

int retrieveAllVisitedPages(std::list&amp; visits, std::map* visitsMap) const {                           
  std::list::const_iterator visitsIt;
  std::list::const_iterator urlsIt;

  int counter = 0, totalPages = 0;
  for( visitsIt =visits.beging()l visitsIt !=visits.end(); visitsIt++) {
    std::list* urls = visitsIt.getTargets();   
    for( urlsIt = urls.begin(); urlsIt != urls.end(); urlsIt++) {
       counter = visitsMap[urlsIt];
       visitsMap-&gt;insert(std::make_pair(urlsIt, ++counter));
    }           
    totalPages += urls.size();
  }
  return totalPages;
}

Although feasible, the implementation of the lazy initialization is quite cumbersome as it involves at a minimum, a constructor and a method (2 step initialization). 


Scala
Scala supports lazy values and lazy local functions. Under some circumstances, the user may want to initialize a value only when it is required. A database developer would qualify this pattern as a write once, read many times pattern.

Note: The developer needs to keep in mind that lazy values or expressions are checked whether the value has been indeed already initialized, each time it is accessed. Such validation is to be thread-safe, and consequently adds overhead to the run-time execution.

Let's evaluate or experience the concept by writing a simple example in Scala. The SiteVisits class implements the basic function to retrieve the list of URLs/pages a user identified by its id access on a specific web site.

class SiteVisits(userId: Int, webSite: String) {
  lazy val urls =  {
    val urlsList = new ArrayBuffer[String]
   
   // Initialize the urls list from the database
    query(userId, webSite)
    urlsList.toArray
  }
  ... 
  def isValidWebsite; Boolean = ....
}

The list of URLs is not loaded from the database when an Instance of SiteVisits is created but when the value urls is read by the client code. For example, the methods of the class SiteVisits such as isValidWebsite can be executed without having to access the list of URLs the user visited on this particular web site. Let's look how lazy evaluation can be applied.

val siteVisit = new SiteVisits(91841, "www.yahoo.com")
  ... 
  if( siteVisit.isValidWebSite )  
     Console.println(siteVisit.urls.mkString(", ") 
  ..
}

In the example above, the list of URLS is not loaded if the site is not valid.


References
  • Effective C++   - S. Meyers  - Addison Wesley 1991
  • More Effective C++  - S. Meyers  - Addison Wesley 1993
  • Programming in Scala  - M. Odersky, L. Spoon, B. Venners  - Artima 2007
  • https://github.com/prnicolas

Friday, October 5, 2012

Type Class Polymorphism and Monads

Target audience: Advanced

Overview

Most of software developers associate the concept of polymorphism to dynamic binding & class inheritance available in the most common object oriented programming languages such as C++ and Java. As part of its functional programming "DNA", Scala provides developers with  alternative polymorphism approaches (beside the ubiquitous method override).

Note: For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted.


Object-Oriented Polymorphism

Java or C++ programmers are familiar with this implementation of polymorphism as described in the example below. A trait the key behavior of a collection by declaring two method += to add a new item into the collection and -= to retrieve the first item from the collection.

trait Collection[T <: Double] {                           
   def += (t:T) : Boolean
   def -= : T 
}

The class, OrderedCollection implements ordered collection of type inherited from Double by defining the two abstract methods += & -=. This class specializes the behavior of the trait by ordering the elements in the collection.

class OrderedCollection[T <: Double] extends Collection[T] {
   import scala.collection.mutable.TreeSet
 
   var c = TreeSet.empty(Ordering.fromLessThan[T]((x:Double, y:Double) => (x < y)))
   override def += (t:T) : Boolean = c.add(t)
   override def -= : T = { 
     val top = c.head
     c.remove(top)
     top 
   }
}

The following code snippet illustrate a simple use case for the ordered collection. Note the declaration of the type of the collection instance, thisCollection is not required and added for the sake of clarity.

val thisCollection: Collection[Double] = new OrderedCollection[Double]
thisCollection += 9.556
thisCollection += 45.7
thisCollection += 4.9

println(thisCollection)


Type Class Polymorphism 

Scala supports Type class polymorphism, also referred as Ah-hoc polymorphism in technical articles. Scala also supports Ah-Doc polymorphism which the ability to create and apply generic polymorphic operators or functions to arguments of different types. Ad hoc polymorphism lets you add features to a type on demand.
Let's define a Wrapper type (trait) which basic purpose is to wrap an element into a collection M[_] of this element. The Wrapper type class declares the operators apply and get that is to be implemented by wrapper associated to a predetermined collection.
- apply creates a collection M[T] of element of type T.
- get retrieves the first and only element of the collection.

Note: The type of the element in the collection is not a parameter type for the Wrapper: T is the type parameter for the collection and type M[_] is the parameter type for the wrapper. The Wrapper trait is defined as:

trait Container[M[_]] {
  def apply[T](t: T): M[T]
  def get[T](c: M[T]): T
}

We can create a type class ArrayBuffer to "wrap" any object by overriding the apply and get operators. The implicit conversion from ArrayBuffer to Wrapper is obviously independent from the type T of the contained element (validation of method argument is omitted).

implicit val arrayBuffer = new Container[ArrayBuffer] {
  override def apply[T](t: T) : ArrayBuffer[T] =  ArrayBuffer[T](t)
  override def get[T](ab: ArrayBuffer[T]) : T = ab.head 
}

def compose[M[_]: Container, X, Y](c1: M[X], c2: M[Y]) = {
   val c = implicitly[Container[M]]
   c(ab.get(c1), ab.get(c2))
}


Monads

One of the most known application of type class polymorphism is the monadic representation of collection. The concept and theoretical foundation of monads is discussed in the previous post. Monads define the minimum set of operations for a functional category of objects. Scala standard library includes quite a few of categories and their associated monadic representation: List[T], Option[T], ..
Let's extend our simple Wrapper to define a Monad for collection containing a single element. The monad must define a constructor, apply, a functor, map that converts a single element collection into a another single element collection by transforming its only element and a flatMap that generate a new single element collection from the element of an existing collection.

trait _Monad[M[_]] {
  def apply[T](t: T): M[T]
  def map[T, U](m: M[_])(f: T => U): M[U]
  def flatMap[T, U](m: M[_])(f: T => M[U]): M[U]
}

As with the trait wrapper we can create a implicit conversion from any single element ArrayBuffer into its associated monad to monadic operators can be used transparently by the client code.

implicit val mArrayBuffer = new _Monad[ArrayBuffer] {
  override def apply[T](t:T): ArrayBuffer[T] =  ArrayBuffer[T](t)
  override def map[T, U](m: ArrayBuffer[T])(f: T => U): ArrayBuffer[U] = ArrayBuffer[U](f(m.head))
  override def flatMap[T, U](m: ArrayBuffer[T])(f: T => ArrayBuffer[U]): ArrayBuffer[U] = f(m.head) 
}

Monads will be discussed in length in a future post.


References

Thursday, September 6, 2012

Don't Fear the Monad: Theory

Target audience: Advanced
Estimated reading time: 6'

Are you familiar with Category Theory, Monads, or Functors? Monads, in particular, are gaining popularity as functional languages like Clojure, Haskell, and Scala become more widely accessible.
This article marks the initial segment of our series, delving into the theory behind Monad implementation. The subsequent post, Don't fear the Monad: Scala will cover the practical implementation in Scala.


Follow me on LinkedIn


Overview

According to the dictionary:
"Monad: an elementary, fundamental substance that mirrors the structure of the world and serves as the foundation for deriving material properties."

In the realm of computer science, a Monad is a construct used to represent operations that can be either nested or chained for a specific data type. Consequently, software engineers can create pipelines to process data, with each step executing predefined rules defined by the monad. Monads are employed to handle tasks such as I/O operations, exceptions, and concurrency (or Futures).

Monads are grounded in the category theory, a field related to the more familiar group theory. The following section assumes some level of familiarity with mathematics, topology, and transformations.

Monoids
A monoid is a structure (S,*,1) consisting of an associative binary operation * over some set S, which  contains an identity element 1 for \[\forall s_{i}\in S\,\,\, s_{i}*s_{j} \in S\\(s_{i}*s_{j})*s_{k} = s_{i}*(s_{j}*s_{k})\\1*s_{i} = s_{i}*1=s_{i}\]

Categories
Categories are a generalization of monoids. A category consists of objects and morphisms. A morphism is a transformation that operates on objects. Let's consider a morphism  f, related to objects x and y:  \[f: x \mapsto y\] Morphisms have the following properties
* Transitivity: \[f: x \mapsto y\,\,,\,g: y \mapsto z\,\Rightarrow f\circ g: x \mapsto z\] * Omnipotence: \[id:x \mapsto x\]  * Associativity: \[f\circ (g\circ h) = (f\circ g) \circ h\]
Sets, Arrays, Vectors, Matrices, Lists or Hash tables constitute category which support morphism.  Let consider two operations, sqr and log on arrays of floating point values x: \[sqr : \{x_{i}\} \mapsto \{x_{i}^2 \} \,\,\, log : \{y_{i}\} \mapsto \{log(y_{i})\}\,\,\,\Rightarrow \,\, sqr \circ log : \{x_{i}\} \mapsto \{2.log(x_{i})\}\]
Functors
Functors are a special class of morphisms called homomorphisms.  A Functor F define a structure-preserving mapping, between two categories, X and Y,   F: X->Y. If f and g are operations on object on categories X/Y, 1 the identity function then \[f: x\in X \mapsto y \in Y,\,\,\,F:X \mapsto Y \Rightarrow F(f): F(x) \mapsto F(y)\\ F(1_{x}) = 1_{F(x)}\\ F(f \circ g) = F(f) \circ F(g)\]  A functor that maps a category to itself is called an endofunctor. A endofunctor, denoted as 1x,  maps every element and morphism of X to itself.

Natural Transformation
Natural transformations are mappings, from one functor to another. Let's denote two Functors F, G between two categories X, Y. A natural transformation phi between F & G is defined as \[\phi[x\in X]: F(x) \mapsto G(x)\\ \forall f \in F : x \mapsto y\,\,,\phi[y] \circ F(f) = G(f) \circ \phi[x]\]
Monads
A monad T = <T,1,m> on a category X consists of
  - A endofunctor T on category X    T:X->X
  - A unit natural transformation  n: 1 -> T
  - A join (or multiplication) natural transformation mu: T*T -> T

A monad obeys to the associativity axiom:\[If\,\, T^{2}(X) = T \circ T (X)\,\,and\,\, T^{3}(X) = T \circ T \circ T (X)\\T(\mu X) : T^{3}(X) \mapsto T^{2}(X)\\\mu X : T^{2}(X) \mapsto T(X)\\\mu T(X) : T^{3}(X) \mapsto T^{2}(X)\] as well as the unit axiom: \[T(\eta X) : T(X) \mapsto T^{2}(X)\\\eta  T(X) : T(X) \mapsto T^2(X)\\ I_{X} : T(X) \mapsto T(X)\] Example:
Let's consider the category, list of integer denoted List<Integer>. The monad related to the List<Integer> category is defined as \[Square : List \left \langle Int \right \rangle \mapsto List \left \langle Int\right \rangle\,\,\,\,\,\,\,\left \{ .. x .. \right \} \mapsto \left \{ .. x^{2} .. \right \}\\\eta : x \mapsto List\left \{ x \right \}\\\mu : List(List\left \langle Int \right \rangle ) \mapsto List \left \langle Int \right \rangle \,\,\left \{ \left \{ x_{0} .. x_{n}\right \}, \left \{y_{0} ..y_{m} \right \} \right \} \mapsto \left \{ x_{0},y_{0} .. x_{n},y_{n} .. y_{m} \right \}\] Monads are used to manage sequence of computations.
 - Exception: may throw exceptions or generated known errors. 
 - Option: may fail or return a result 
 - State: may use mutable state 
 - Reader: may maintain a context during computation 
 - Writer: may produce side-effect during computation 
 - I/O: may success or fails depends on global state or variables.