Wednesday, May 31, 2023

Pythonic Coding Exercises: Recursion

Target audience: Intermediate
Estimated reading time: 4'
The most effective way to truly grasp a programming language is by diving in and tackling unique algorithm challenges. In this article, we'll explore common uses of recursion, taking full advantage of Python's dynamic capabilities.

Let's look at some coding gems and have the fun begin!!

Table of contents

       Find a line with max number of points

Follow me on LinkedIn

What you will learn: How to apply recursion in Python to solve search, optimization and computational problems.

Note: The implementation uses Python 3.11

Introduction

Recursion involves solving a problem by breaking it down into simpler instances of the same problem. Essentially, it's a method where a procedure references itself during its execution.

Tail recursion, a specific kind of recursion, occurs when the function's last action is a recursive call. It's optimized to prevent unnecessary stacking, allowing the recursive calls to delve deeper without overwhelming the call stack.

Implicit recursion is another variant where a function indirectly calls itself, without an overt recursive statement

Coding exercises

Find a line with max number of points

ProblemGiven n points on a 2-dimension plane, find the maximum number of points that lie on the same line.

Solution From the mathematics standpoint, N points with coordinates (x, y) are aligned if slope dy/dx between each of these N points are the same (i.e. (1, 1), (3, 3) and (6, 6) belongs to line y = 3.x). This implementation uses recursion. We define a data class Point with basic == and slope computation methods. Time complexity O(n).

Implementation:

from dataclasses import dataclass
from typing import TypeVar, List, Set, Dict
Point = TypeVar("Point")


@dataclass
class Point(object):
    """ 
      Convenient class wrapper to implement the == operator and computation of the derivative 
    """
    x: int
    y: int

    # Override == operator for this class
def __eq__(self, other: Point) -> bool: return self.x == other.x and self.y == other.y  
    # Compute the derivative/sloped between any two points
    def slope(self, other: Point) -> float:
        if self == other:          # To safe guard against self
            return -1.0
        elif self.x == other.x:  # To avoid divide by zero
            return 1e+8
        else:                          # Otherwise compute the derivative
            return float(self.y - other.y)/(self.x - other.x)




def max_num_points(points: List[Point]) -> List[Point]:
  """ 
   Compute the number of points which belongs to the same line (same slope) using recursion
  """
    
   # Execute recursion along the list of points using  list index
   def align(index: int, visited: Dict[float, List[Point]]) -> Dict[float, List[Point]]:
      if index >= len(points) - 1:
         return visited
      else:
        for pt in points:
                
           # Make sure we exclude a data point already visited
           if not points[index] == pt:
              this_slope = points[index].slope(pt)
              # Update or create
              new_visited: List[Point] = visited[this_slope] if this_slope in visited else []

             # Avoid duplicate list of points associated with
              if pt not in new_visited:
                  new_visited.append(pt)

                 # Update the dictionary { slope: list of points with same slope }
                 visited[this_slope] = new_visited

            return align(index+1, visited)
 

 if len(points) == 0:
    return [] 
 else:
   all_slopes_dict = align(0, {})

   # Finally extract the list of points associated with
   # a slope with maximum number of points.
   max_num_pts = -1
   output_points = None
   for slope, points in all_slopes_dict.items():
      if max_num_pts < len(points):
         max_num_pts = len(points)
         output_points = points

   return output_points


Test:
input_points = [
  Point(1, 1), Point(2, 4), Point(3, 3), Point(4, 1), Point(5, 11), Point(6, 6), Point(8, 13)
]

print(str(max_num_points(input_points))) 
    # [Point(x=3, y=3), Point(x=6, y=6), Point(x=1, y=1)]



Optimize coins distribution

ProblemFind the minimum set of coins for a set of coin_types (1 cent, 5 cents  ... 10 cents) required to foot a given bill (meet a target_amount). For example  168 = 100 cents * 1 + 50 cents * 1 + 10  cents * 2 + 5 cent * 1 + 1 cent * 3.

Solution Use recursion, minimizing the number of coins needs to reach the needed amount at each step. The recursion exits when either all the various type of coins have been used or the target amount has been finally reached. The time complexity is O(n)

Implementation:

class CoinsDistribution(object):
    """
        Find the optimal distribution of set of coins (1 cent, 5 cent, 10 cent,..)
        to foot a given bill (i.e. 127 cents = 100 cents*1 + 25 cents*1 + 1 cent* 2
        Recursion on the remaining amount to close once a coin if found
    """
    def __init__(self, coins: []):
        # We do not assume the coins are actually sorted
        self.coin_types = sorted(coins, reverse=True)

    
    def find(self,  target_amount: int) -> List[int]:

        def _find(left_over_amount: int, index: int, coin_distribution: List[int]) -> List[int]:
            # Recursion exit condition (amount reached or no more coin left)
            if left_over_amount == 0 or index >= len(self.coin_types):
                return coin_distribution
            
           else:
                remaining_amount = left_over_amount

                # Attempt to assign as many coins for this category of coins
                while remaining_amount >= self.coin_types[index]:
                    remaining_amount -= self.coin_types[index]
                    coin_distribution[index] += 1
                # Move to the next type of coin
                return _find(remaining_amount, index+1, coin_distribution)

        return _find(target_amount, 0, [0] * len(self.coin_types))


Test:
coin_types = [1, 5, 10, 25, 50, 100]
target_amount = 376
coins_distribution = CoinsDistribution(coin_types)
distribution = coins_distribution.find(target_amount)

acc = [f'{distribution}*{coins_distribution.coin_types[index]}'
           for index, distribution in enumerate(distribution) if distribution > 0]
print(' + '.join(acc))  # 3*100 + 1*50 + 1*25 + 1*1



Test if a list has a cycle

Problem: Find if a list, input_values has a cycle. The problem is equivalent to finding the first duplicate in a list.

Solution: Use two iterators: the second iterator advancing twice as fast the first one. Time complexity O(n).

Implementation:
def has_cycle(input_values: List[int]) -> bool:
  """ Test if a list has a duplicate or a cycle, using two iterator """
  iter1 = iter(input_values)
  iter2 = iter(input_values)

  try:
    while True:
      value1 = next(iter1)
      next(iter2)               # Iter2 advances two elements per iteration
      value2 = next(iter2)
            
      if value1 == value2:
          return True        # Find & exit

  except StopIteration as e:
    return False

Test:
values1 = [2, 4, 6, 3, 11, 7, 9, 14, 17]
values2 = [2, 4, 6, 3, 11, 6, 9, 14, 17]

print(has_cycle(values1))  # False
print(has_cycle(values2))  # True


Intersection of sorted arrays

ProblemFind the intersection of two sorted lists list1 and list2 of integers.

Solution: Map the two lists of integer n to lists of tuples (n, list_index) and push the tuples into a priority queue using heaps module. Record two consecutive items, from two different list popped from the priority queue having the same value. The time complexity is O(long + n) ~ O(n)

Implementation:
def intersect_sorted_lists(list1: List[int], list2: List[int]) -> List[int]:
  """ 
    Extract the intersection of two sorted list of different size using 
    a priority queue and recursion
  """
    
  # Taken care of the basic case
  if list1[-1] < list2[0] or list1[0] > list2[-1]:
     return []

  else:
     # Otherwise convert integers, n into tuple (n, list_index)
     import heapq

     pqueue = []
     [heapq.heappush(pqueue, (item, 0)) for item in list1]
     [heapq.heappush(pqueue, (item, 1)) for item in list2]

     # Recursively popping up tuples from the priority queue
     def intersect(new_tuple: Tuple, tuples_list: List[Tuple]) -> List[Tuple]:
       if len(pqueue) == 0: # Our recursion exit condition
         return tuples_list
       else:
         # Next tuple in the priority queue
         item, index = heapq.heappop(pqueue)
                
         # If two consecutive integers from different list have the same value...
         if new_tuple[1] == item and index != new_tuple[0]:
           tuples_list.append(new_tuple)
         return intersect((index, item), tuples_list)


     first_item, first_index = heapq.heappop(pqueue)
     intersect_tuples = intersect((first_item, first_index), [])
        
     return [item for _, item in intersect_tuples]

Test:
values1 = [0, 2, 8, 10, 12, 34, 46, 48, 54, 99]
values2 = [3, 8, 6, 12, 14, 15, 18, 19, 22, 40, 41, 44, 45, 46, 50, 53]

print(intersect_sorted_lists(values1, values2))  # [8, 12, 46]


Sequence of consecutive integers with highest sum

Problem: Find the sequence of num_items consecutive integers from a given list, input_list which produces the highest sum. For example, extracting 3 consecutive integers with highest sum from the list [2, 1, 5, 9, 3, 2] will produce [5, 9, 3]

Solution: Implements an efficient tail recursion to generate a tuple (sum of the sequence, starting index of the  sequence).

Implementation:
def extract_seq_max_sum(input_list: List[int], num_items: int) -> (int, int):
  """ Extract the sequence of consecutive integers with the maximum summation """

  def _extract_seq_max_sum(
     input_list: List[int],
     num_items: int,
     max_sum: int,
     start_index: int,
     cnt: int) -> (int, int):
        
     # Condition to exit the recursion
     if len(input_list) < num_items:
       return max_sum, start_index
 
     new_sum = sum(input_list[:num_items])
     new_max_sum, new_start_index = \ 
        \(new_sum, cnt) if new_sum > max_sum \ 
         else (max_sum, start_index)
     
     return _extract_seq_max_sum(input_list[1:], num_items, new_max_sum, new_start_index, cnt + 1)

  return _extract_seq_max_sum(input_list, num_items, 0, 0, 0)



Test:
values = [4, 2, 8, 6, 12, 34, 6, 8, 4, 9, 11, 2, 17, 22, 5, 8, 6, 1, 4, 13, 19]

print(extract_seq_max_sum(values, 3))  # 52, 3  [6, 12, 34]
print(extract_seq_max_sum(values, 5))  # 66, 6  [8, 6, 12, 34, 6]


Longest sequence of increasing values

Problem: Extract the longest sequence of increasing values from an arbitrary list of integers. For instance the longest sequence of increasing value in [2, 1, 4, 5, 8, 3, 5, 0, 2] is [4, 5, 8].

Solution: Implements a recursion longest_increasing_seq over the input list of integers by tracking the sequence contains the current value and comparing with the current longest sequence of increasing value. The recursion walks through the input values using an iterator.

Implementation:
def longest_increasing_seq(input_values: List[int]) -> List[int]:
  """ Extract the longest increasing sequence of integer from a list using recursion."""
 
 def longest_increasing_seq(
    input_values_iter,
    cur_increasing_seq: List[int],
    longest_seq: List[int]) -> List[int]:

    while True:
      try:
         # Next element in the list
         next_element = next(input_values_iter)

         # If current increasing list empty or new element > last element
         # add the new element in the current list and continue
         if len(cur_increasing_seq) == 0 or next_element > cur_increasing_seq[-1]:
            cur_increasing_seq.append(next_element)
         
         # Otherwise complete the current increasing sequence and
         # update the longest list if necessary
         else:
            new_longest_seq = cur_increasing_seq.copy() \
               if len(cur_increasing_seq) > len(longest_seq) \
               else longest_seq

            # Re-initialize the current increasing list
            cur_increasing_seq.clear()
            cur_increasing_seq.append(next_element)
          
            # Invoke the next recursion
return longest_increasing_seq(input_values_iter, cur_increasing_seq, new_longest_seq)  
     except StopIteration as e: # Exit point for recursion
        return longest_seq

  return longest_increasing_seq(iter(input_values), [], [])


Test:
values = [6, 1, 4, 9, 11, 22, 17, 8, 6, 1, 4, 13, 19]  
print(str(longest_increasing_seq(values)))      # 1, 4, 9, 11, 22
 

List of items which value equals its index

Problem: Find the integers in a list which value equals its index. For example the algorithm should select the second item 1 from the list  [3, 1, 5, 0].

Solution: Simple traversal with a time complexity O(n).

Implementation:
def get_values_eq_index(input_list: List[int]) -> List[int]:
  """
    Retrieve the list of element for which the value is equal to its index in the list
  """
  match len(input_list):
    case 0:
      return []
    case 1:
      return input_list
    case _:
      return [el for index, el in enumerate(input_list) if index == el]


Test:
input_list = [2, 9, 2, 5, 4, 8, 1, 5, 8, 10]
print(get_values_eq_index(input_list))           # 8


Find first duplicate in a list

Problem: Find the first duplicate in a string.

Solution: Use a set to keep track of visited nodes. Worst case time complexity O(n). This is an example for which recursion is not warranted.

Implementation:
def get_first_duplicate(input_str: AnyStr) -> Optional[AnyStr]:
  """Get the first char duplicate in a string if found, None otherwise"""

  match len(input_str):
    case 0: return None
    case 1: return input_str[0]   # Return the only element
    case _:                                # After dealing with the first special cases
       unique_set = set()
       for ch in input_str:
          if ch in unique_set:
             return ch
          unique_set.add(ch)

       return None

Test:
input_str1 = "helo"
input_str2 = "hello"

print(get_first_duplicate(input_str1))  # None
print(get_first_duplicate(input_str2))  # 'l'


Check if a binary tree is balanced

Problem: Test whether a binary tree defined its node, Node, is balanced. Every path between the root and any leaf should have the same number of nodes.

Solution: Apply an in-order-traversal to collect the number of nodes of all the various lineages, root to leaves then compute the difference of longest and shortest lineages. The traversal of the tree, method, __walk uses recursion. Time complexity O(n). 

Implementation:
class Node(object):
  """ Basic binary tree node """
  def __init__(self, data: int):
    self.left = None
    self.right = None
    self.data = data


  def add(self, data: int) -> NoReturn:
     """
        Simple insertion to a binary tree, appending a new node if necessary
     """
    if data < self.data:
      if self.left:
        self.left.add(data)
      else:
        self.left = BinTreeNode(data)
    elif data > self.data:
        if self.right:
          self.right.add(data)
        else:
          self.right = BinTreeNode(data)
    else:
        self.data = data


  def is_balanced(self) -> bool:
    path_acc = []
    self.__walk('', path_acc)

    if path_acc:
      # Measure the length of the various paths from root to leaves
      path_sizes = [len(x) for x in path_acc if x != '']
      return max(path_sizes) - min(path_sizes) < 1

    else:
      return True


    # ---------   Private helper method ---------

  def __walk(self, path: AnyStr, lineages: List[AnyStr]):
    """ DFS walk """
    if self.left:   # This node has a left child
      self.left.__walk(f'{path}L', lineages)
      lineages.append(path)
        
    if self.right:   # This node has a right childe
      self.right.__walk(f'{path}R', lineages)
      lineages.append(path)


Test:
root = Node(1)
values = [2, 9, 8, 10, 13, 7]
[root.add(n) for n in values]
print(root.is_balanced())        # False

root = Node(10)
values = [6, 14, 4, 8, 12, 16]
[root.add(n) for n in values]
print(root.is_balanced())        # True


Thank you for reading this article. For more information ...

References


---------------------------
Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning. 
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning" Packt Publishing ISBN 978-1-78712-238-3

Tuesday, April 18, 2023

Extend LangChain Sequences with Data Types

Target audience: Advanced
Estimated reading time: 4'  

Since its public debut last year, Generative AI, especially OpenAI's ChatGPT [ref 1], has captivated the collective imagination. Frameworks like LangChain [ref 2], built around LLM APIs, have subsequently emerged to streamline processes and workflows.

This post introduces the concept generative workflow by extending LangChain with typed input and condition on input values.The post assumes the reader is somewhat has some basic knowledge of Python, OpenAI ChatGPT and LangChain.

Table of contents

Introduction

Follow me on LinkedIn

Notes: 

  • As I publish this post, OpenAI releases a new version of gpt-3.5-turbo that supports functions with typed input and output (ChatGPT functions)[ref 3]
  • The code snippets uses Python 3.9 and LangChain 0.0.200
  • To enhance the readability of the algorithm implementations, we have omitted non-essential code elements like error checking, comments, exceptions, validation of class and method arguments, scoping qualifiers, and import statements.
  • This post is not generated by ChatGPT but assumes the reader is already familiar with Large Language Model
  • The source code is available on GitHub https://github.com/patnicolas/chatgpt-patterns

Introduction

The LangChain Python framework built on OpenAI API to build large language models applications. The framework organizes ChatGPT API functionality into functional components, similar to Object-Oriented design). These components are assembled into customizable sequence or chains that can be dynamically configured. It allows developers to sequence of tasks (chains) with message/prompt as input (role=user) and answer (role=assistant) as output [ref 4]. This concept is analog to traditional function call

Let's consider a typical function call in Python
    def func(**kwargs: dict[str, any]) -> output_type:
        ....
       return x

It easy to establish an equivalence between the component of a method and the components of a chain.

PythonLLM
Function callLLM message/request
Function name (func)Prompt prefix
Argument (**kwargs)List of tuple (variable_name, variable_type, condition)
Returned type (output_type)LangChain output key

LangChain does not explicitly support types such as integer, dictionary, float.. in input messages. The next section extends LangChain functionality by adding types in ChatGPT request messages and given the data type, a condition or filter on the variable.

Example:

  • prompt prefix "Compute the sum of elements of an array"
  • Arguments:  (x, list[float], element > 0.5)

generate the following  prompt.  "Compute the sum of the elements of an array x of type list[float] for which elements > 0.5"

The next section describes the Python implementation of a workflow of typed chains for ChatGPT using LangChain framework.


Generative workflow

The first step is to install LangChain Python module and setup the OpenAI API key as an environment variable of target machine. The LangChain Quickstart guide [ref 5] is very concise and easy to follow so there is no need to duplicate the information in this post.

LLM chains and sequence are important functions of LangChain framework. They allow developers to build sequence of chains. It allows developers to assemble basic function, LLMChain into fully functional workflow or sequence (type SequentialChain)

Let's extend SequentialChain with typed and condition on input values by implemented a workflow is defined by the class ChatGPTTypedChains. The constructor has two arguments:

  • Temperature, _temperature, to initialize the ChatGPT request
  • Optional task definition, task_builder that define the task implemented as a chain
The task builder has two arguments:
  • task description (prompt)
  • List of input variables defined as tuple (name variables, data type, and optional condition of the variable
and the type of output value
The class member, input_keys captures the names of input to the workflow

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import SequentialChain, LLMChain
from collections.abc import Callable

"""
    This class extends the langchain sequences by defining explicitly 
    - Task or goal of the request to ChatGPT
    - Typed arguments for the task

    The components of the prompt/message
    - Definition of the task (i.e. 'compute the exponential value of')
    - Input variables defined as tuple (name, type, condition) (i.e.  'x', 'list[float]', 'value < 0.8')
"""


class ChatGPTTypedChains(object):
  def __init__(self, _temperature: float, task_builder: Callable[[str, list[(str, str, str)]], str] = None):
      """
      Constructor for the typed sequence of LLM chains
      @param task_builder Builder or assembler for the prompt with {task definition and
                list of arguments {name, type, condition} as input and prompt as output
      @param _temperature Temperature for the softmax log probabilities
      @type _temperature: floating value >= 0.0
      """
      self.chains: list[LLMChain] = []
      self.llm = ChatOpenAI(temperature=_temperature)
      self.task_builder = task_builder if task_builder else ChatGPTTypedChains.__build_prompt
      self.input_keys: list[str] = []



Additional tasks are appended to the workflow with their 3 components
  • Task descriptor, task_definition (prompt)
  • Task parameters, arguments as tuple (name of input, type of input and optional condition)
  • Return type, as _output_key
The private, static method, _build_prompt assembles the task components to generate the actual prompt, this_input_prompt, then processed by the LangChain template generator.

def append(self, task_definition: str, arguments: list[(str, str, str)], _output_key: str) -> int:
   """
   Add a new task (LLM chain) into the current workflow...
   @param _output_key: Output key or variable
   @param task_definition: Definition or specification of the task
   @type arguments: List of tuple (variable_name, variable_type, variable_condition)
   """
       # We initialize the input variables for the workflow
   if len(self.input_keys) == 0:
      self.input_keys = [key for key, _, _ in arguments]

        # Build the prompt for this new prompt
   this_input_prompt = ChatGPTTypedChains.__build_prompt(task_definition, arguments)
   this_prompt = ChatPromptTemplate.from_template(this_input_prompt)

        # Create a new LLM chain and add it to the sequence
   this_llm_chain = LLMChain(llm=self.llm, prompt=this_prompt, output_key=_output_key)
   self.chains.append(this_llm_chain)
   return len(self.chains)


@staticmethod
def __build_prompt(task_definition: str, arguments: list[(str, str, str)]) -> str:
   def set_prompt(var_name: str, var_type: str, var_condition: str) -> str:
       prompt_variable_prefix = "{" + var_name + "} with type " + var_type
       return prompt_variable_prefix + " and " + var_condition \
                if not bool(var_condition) \
                else \
                prompt_variable_prefix

   embedded_input_vars = ", ".join(
            [set_prompt(var_name, var_type, var_condition) \ \
                for var_name, var_type, var_condition in arguments]
   )
   return f'{task_definition} {embedded_input_vars}'

The method __call__ implements the workflow as a LangChain sequence chain. This method takes two arguments: Input to the workflow (input to the first task) _input_values and the name/keys for the output values (output from the last task in the sequence).

def __call__(self, _input_values: dict[str, str], output_keys: list[str]) -> dict[str, any]:
   """
   Execute the sequence of typed task (LLM chains)
   @param _input_values: Input values to the sequence
   @param output_keys: Output keys for the sequence
   @return: Dictionary of output variable -> values
   """
   chains_sequence = SequentialChain(
            chains=self.chains,
            input_variables=self.arguments,
            output_variables=output_keys,
            verbose=True
    )
     return chains_sequence(_input_values)


Simple use cases

We select two simple use cases each implemented as workflow (LLM chain sequence) with the following tasks
  •   Two numerical tasks (math functions: sum and exp)
  •   Term frequency-Inverse document frequency (TF-IDF) scoring and ordering  task

Numerical computation chain

The sequence of chains, numeric_tasks of the two tasks consists of computing
  1. The sum of an array x of type list[float] or which values < 0.8
  2. Apply the exponential function to the sum

In this particular example, an array of 120 floating point values are generated through a sin function then filter through the condition x < 0.8. The output value is a dictionary with a single key 'u'.

def numeric_tasks() -> dict[str, str]:
   import math

   chat_gpt_seq = ChatGPTTypedChains(0.0)
   
   # First task:  implement lambda x: sin(x*0.001)
   input_x = ','.join([str(math.sin(n * 0.001)) for n in range(128)])
   chat_gpt_seq.append("Sum these values ", [('x', 'list[float]', 'values < 0.8')], 'res')
    , 
   # Second task: function u: exp(sum(x))
   chat_gpt_seq.append("Compute the exponential value of ", [('res', 'float', '')], 'u')
   input_values = {'x': input_x}
   output: dict[str, str] = chat_gpt_seq(input_values, ["u"])

   return output



TF-IDF score

This second use case consists of two tasks (LLM chains)

  1. Computation of TF-IDF score, tf_idf_score of terms extracted from 3 documents/files (file1.txt, file2.txt, file3.txt). The key for input values, documents, is the content of the 3 documents.
  2. Ordering the items by their TF-IDF score. The output key, ordered_list is the list of terms ranked by their decreasing TF-IDF score.

def load_content(file_name: str) -> str:
  with open(file_name, 'r') as f:
      return f.read()


def load_text(file_names: list[str]) -> list[str]:
  return [load_content(file_name) for file_name in file_names]


def tf_idf_score() -> str:
   chat_gpt_seq = ChatGPTTypedChains(0.0)

   # Load documents for which TF-IDF score has to be computed
   input_files = ['../input/file1.txt', '../input/file2.txt', '../input/file2.txt']
   input_documents = '```'.join(load_text(input_files))

   # Create first task: Compute the 
   chat_gpt_seq.append(
        "Compute the TF-IDF score for words from documents delimited by triple backticks with output format term:TF-IDF score ```",
        [('documents', 'list[str]', '')], 'terms_tf_idf_score')
  
   # Create a second task  
  chat_gpt_seq.append("Sort the terms and TF-IDF score by decreasing order of TF-IDF score",
                         [('terms_tf_idf_score', 'list[float]', '')], 'ordered_list')

   output = chat_gpt_seq({'documents': input_documents}, ["ordered_list"])
   return output['ordered_list']


Thank you for reading this article. For more information ...

References




---------------------------
Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning. 
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning" Packt Publishing ISBN 978-1-78712-238-3