Monday, August 19, 2024

Performance Improvement in Numpy 2.x & Lapack 3.9

Target audience: Beginner
Estimated reading time: 4'
The performance of training geometric learning and complex deep learning models directly affects time to market and development costs. Numpy 2.x, combined with the accelerator and the latest BLAS/LAPACK libraries, can reduce execution time by up to 38%.

      Evaluation
      Implementation
       Evaluation


What you will learn: Enhanced performance in linear algebra computations using Numpy 2.x with the ILP64 accelerator and LAPACK 3.9.1 library

Notes

  • Environments: Python  3.11,  Matplotlib 3.9, Numpy 1.2.6/2.1, LAPACK 3.9.1
  • Source code is available at Github.com/patnicolas/Data_Exploration/numpy2
  • To enhance the readability of the algorithm implementations, we have omitted non-essential code elements like error checking, comments, exceptions, validation of class and method arguments, scoping qualifiers, and import statements.


Overview

Non-linear deep learning and geometric learning depend on geodesics, Riemannian metrics, and exponential maps to train models and predict outcomes. These operations are computationally intensive, making them ideal candidates for performance enhancements in the latest release of Numpy, version 2.1 [ref 1]. 
Linear algebra operations, on the other hand, rely on the BLAS and LAPACK libraries, which can be optimized for specific platforms [ref 2].

This study is performed on a MacOS M2 Maxlaptop, utilizing optimizations of various mathematical libraries like LAPACK and BLAS from the Accelerate framework [ref 3]. Specifically, we use the ILP64 interface with the LAPACK 3.9.1 library alongside Numpy 2.1 [ref 4].

In the following two sections, we will assess the performance enhancements in Numpy 2.1 when utilizing the ILP64 accelerator for the LINPACK library on macOS. The evaluation will cover two scenarios:
- Basic Linear Algebra operations
- Fast Fourier Transform

Linear Algebra

Let's compare the performance of Numpy 2.1 with the LINPACK accelerator 3.9.1 against Numpy 1.2.6 by performing the following computations:
  • Sigmoid function on a Numpy array of varying sizes
  • Summation of two 3D Numpy arrays
  • Multiplication of two 3D Numpy arrays
  • Dot product of two arrays.

Implementation

Let's start by creating keys for our linear algebra operators.

SIGMOID: AnyStr = 'sigmoid'
ADD: AnyStr = 'add'
MUL: AnyStr = 'mul'
DOT: AnyStr = 'dot'


We wraps the basic operations on Numpy array into the class LinearAlgebraEval with two constructors
  • Default, __init__ with input array and shape
  • build method to generate a random 3D array.
The four methods/operations—sigmoid, add, mul, and dot product—are straightforward and self-explanatory.

class LinearAlgebraEval(object):
    def __init__(self, x: np.array, shape: List[int]) -> None:
        self.x = x.reshape(shape)

    @classmethod
    def build(cls, size: int) -> Self:
        _x = np.random.uniform(0.0, 100.0, size)
        return cls(_x, [10, 100, -1])

    def sigmoid(self, a: float) -> np.array:
        return 1.0/(1.0+ np.exp(-self.x * a))

    def add(self, y: np.array) -> np.array:
        return self.x + y.reshape(self.x.shape)

    def mul(self, y: np.array) -> np.array:
        z = 2.0* self.x * y.reshape(self.x.shape)
        return z * z

    def dot(self, y: np.array) -> np.array:
        return np.dot(self.x.reshape(-1), y.reshape(-1))
   

The performance evaluation is conducted by an instance of the LinearAlgebraPerfEval class. The key method __call__ relies on a dictionary with operation as key and list of execution times for these operations.

class LinearAlgebraPerfEval(object):
    def __init__(self, sizes: List[int]) -> None:
        self.sizes = sizes

    def __call__(self) -> Dict[AnyStr, List[float]]:
        _perf = {}
        lin_algebra = LinearAlgebraEval.build(self.sizes[0])
        _perf[SIGMOID] = [LinearAlgebraPerfEval.__performance(lin_algebra, SIGMOID)]
        _perf[ADD] = [LinearAlgebraPerfEval.__performance(lin_algebra, ADD)]
        _perf[MUL] = [LinearAlgebraPerfEval.__performance(lin_algebra, MUL)]
        _perf[DOT] = [ LinearAlgebraPerfEval.__performance(lin_algebra, DOT)]

        for index in range(1, len(self.sizes)):
            # Alternative constructor
            lin_algebra = LinearAlgebraEval.build(self.sizes[index])

            # The method __record executes each of the four operations for
            # a variable size of Numpy array.

            _perf = LinearAlgebraPerfEval.__record(_perf, lin_algebra, SIGMOID)
            _perf = LinearAlgebraPerfEval.__record(_perf, lin_algebra, ADD)
            _perf = LinearAlgebraPerfEval.__record(_perf, lin_algebra, MUL)
            _perf = LinearAlgebraPerfEval.__record(_perf, lin_algebra, DOT)
        return _perf


The private, helper method, __record described in the Appendix, executes and time  each of the four operations for set of Numpy arrays of increasing size. The timing relies on the timeit decorator described in the Appendix.

Evaluation

Let's plot the execution time for these 4 operations with a size of 3D arrays (10, 100, -1) varying between 1million to 60 million values.

Numpy 1.2.6 with OpenBlas


Numpy 2.1 with ILP64- BLAS/LAPACK 3.9.1



Numpy 2.1 and ILP-64 have no impact on the performance of the Sigmoid but reduce the average time to execute and addition, multiplication and dot product on these large arrays by 25 to 40%


Fast Fourier Transform

Let's define a signal as a sum of 4 sinusoidal functions with 4 different frequency modes.\[ f(t)=\sum_{i=1}^{4} sin(2.\pi.f_{i}.t)\]The signal is then sampled in the interval [0, 1] with various number of data points.

Implementation

We encapsulate the generation, sampling, and extraction of the frequency spectrum within the `FFTEval` class. 
The constructor iteratively adds sine functions to implement the desired formula. 
The `compute` method samples the signal over the interval [0, 1] and calls the Numpy `fft.fftfreq` method to generate the frequency distribution. The timing of the execution in the method compute uses the timeit decorator.

class FFTEval(object):
    def __init__(self, frequencies: List[int], samples: int) -> None:
        self.F = 1.0/samples
        self.x = np.arange(0, 1.0, self.F)
        pi_2 = 2*np.pi
        
       # Composed the sinusoidal signal/function
        self.signal = np.sin(pi_2*frequencies[0]*self.x)
        if len(frequencies) > 1:
            for f in frequencies[1:]:
                self.signal += np.sin(pi_2*f*self.x)

    @timeit
    def compute(self) -> (np.array, np.array):
        num_samples = len(self.signal)

        # Select the positive amplitudes and frequencies
        num_half_samples = num_samples//2

        # Invoke the Numpy FFT function to extract frequencies
        freqs = np.fft.fftfreq(num_samples, self.F)
        _freqs = freqs[:num_half_samples]
        _amplitudes = np.abs(np.fft.fft(self.signal)[:num_half_samples]) / num_samples
        return _freqs, _amplitudes

The class FFTPerfEval implements the execution and collection of the execution time for the FFT with various frequency of samples (number of samples in the interval [0, 1])

class FFTPerfEval(PerfEval):
    def __init__(self, sizes: List[int]) -> None:
        super(FFTPerfEval, self).__init__(sizes)

    def __call__(self) -> List[float]:
        durations = []

        # Collect the execution time for each of the number of
        # samples defined in the constructor
        for samples in self.sizes:
            durations.append(FFTPerfEval.__compute(samples))
        return durations

    @staticmethod
    def __compute(sz: int) -> float:
        frequencies = [4, 7, 11, 17]
        fft_eval = FFTEval(frequencies, sz)
        # Time it
        return fft_eval.compute()


Evaluation

Numpy 1.2.6 with OpenBlas



Numpy 2.1 with BLAS/LAPACK 3.9.1



Relative performance improvement


The switch to Numpy 2.1 with ILP-64 on MacOS 14.6.1 shows an average improvement of 33%

References

[1Numpy
[2Lapack


------------------
Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning. 
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning", Packt Publishing ISBN 978-1-78712-238-3 
and Geometric Learning in Python Newsletter on LinkedIn.

Appendix

Standard timing decorator in Python

def timeit(func):
    import time

    def wrapper(*args, **kwargs):
        start_time = time.time()
        func(*args, **kwargs)
        end_time = time.time()
        return end_time - start_time
    return wrapper


Methods to compute and record the execution time for basic linear algebra operations. The execution time is recorded through the decorator timer

 @staticmethod
    def __record(perf: Dict[AnyStr, List[float]],
                 lin_algebra: LinearAlgebraEval,
                 op: AnyStr) -> Dict[AnyStr, List[float]]:
        duration = LinearAlgebraPerfEval.__performance(lin_algebra, op)
        lst = perf[op]
        lst.append(duration)
        perf[op] = lst
        return perf

    @timeit
    @staticmethod
    def __performance(lin_algebra: LinearAlgebraEval, op: AnyStr) -> np.array:
        match op:
            case "sigmoid":
                return lin_algebra.sigmoid(8.5)
            case "add":
                return lin_algebra.add(lin_algebra.x)
            case "mul":
                return lin_algebra.mul(lin_algebra.x)
            case "dot":
                return lin_algebra.dot(lin_algebra.


Wednesday, July 24, 2024

Uniform Manifold Approximation and Projection

Target audience: Expert
Estimated reading time: 8'
NewsletterGeometric Learning in Python                                                                       

Are you frustrated with the linear assumptions of Principal Component Analysis (PCA) or losing the global structure and relationships in your data with t-SNE? 
Leveraging Riemannian geometry, Uniform Manifold Approximation and Projection (UMAP) might be the solution you're looking for.


       t-SNE
       UMAP
       Evaluation code
       Output
       Evaluation
       Output
       Tuning



What you will learn: How to leverage to Uniform Manifold Approximation and Projection method as a reliable non-linear dimensionality reduction technique that preserve the global distribution over low dimension space (Manifold).


The complete article, featuring design principles, detailed implementation, in-depth analysis, and exercises, is available in Dive into Uniform Manifold Approximation & Projection



Notes

  • Environments: Python  3.11,  SciKit-learn 1.5.1,  Matplotlib 3.9.1, umap-learn 0.5.6
  • Source code is available at  Github.com/patnicolas/Data_Exploration/umap
  • To enhance the readability of the algorithm implementations, we have omitted non-essential code elements like error checking, comments, exceptions, validation of class and method arguments, scoping qualifiers, and import statements.

Introduction

Dimension reduction algorithms can be classified into two categories:

  1. Models that maintain the global pairwise distance structure among all data samples, such as Principal Component Analysis (PCA) and Multidimensional Scaling (MDS).
  2. Models that preserve local distances, including t-distributed Stochastic Neighbor Embedding (t-SNE) and Laplacian Eigenmaps.

Uniform Manifold Approximation and Projection (UMAP) leverages research done for Laplacian eigenmaps to tackle the problem of uniform data distributions on manifolds by combining Riemannian geometry with advances in fuzzy simplicial sets [ref 1].

t-SNE is considered the leading method for dimension reduction in visualization. However, compared to t-SNE, UMAP maintains more of the global structure and offers faster run time performance.


t-SNE

t-SNE is a widely used technique for reducing the dimensionality of data and enhancing its visualization. It's often compared to PCA (Principal Component Analysis), another popular method. This article examines the performance of UMAP and t-SNE using two well-known datasets, Iris and MNIST.

t-SNE is an unsupervised, non-linear technique designed for exploring and visualizing high-dimensional data [ref 2]. It provides insights into how data is structured in higher dimensions by projecting it into two or three dimensions. This capability helps reveal underlying patterns and relationships, making it a valuable tool for understanding complex datasets.

UMAP

UMAP, developed by McInnes and colleagues, is a non-linear dimensionality reduction technique that offers several benefits over t-SNE, including faster computation and better preservation of the global structure of the data.

Benefits
  • Ease of Use: Designed to be easy to use, with a simple API for fitting and transforming data.
  • Scalability: Can handle large datasets efficiently.
  • Flexibility: Works with a variety of data types and supports both supervised and unsupervised learning.
  • Integration: Integrates with other Python scientific computing libraries in Python, such as Numpy, PyTorch and Scikit-learn.

The algorithm behind UMAP involves:

  1. Constructing a high-dimensional graph representation of the data.
  2. Capturing global relationships within the data.
  3. Encoding these relationships using a simplicial set.
  4. Applying stochastic gradient descent (SGD) to optimize the data's representation in a lower dimension, akin to the process used in autoencoders.

This method allows UMAP to maintain both local and global data structures, making it effective for various data analysis and visualization tasks.

For the mathematically inclined, the simplicial set captures the local and global relationships between two points xi and xj as: \[sim(x_{i}, x_{j})=e^{ - \frac{||x_{i}-x_{j}|| ^{2}}{\sigma_{i} \sigma_{j}}}\]with sigma parameters as scaler.
The cost function for all the points x with their representation yi is \[L=\sum_{i=1}^{n}\sum_{j=1}^{n}\left ( sim(x_{i}, x_{j}) - sim(y_{i}, y_{j}) \right )^2\]

Python module is installed through the usual pip utility.
    pip install umap-learn

The reader is invited to consult the documentation [ref 3].

Datasets

Let's consider the two datasets used for this evaluation:
  • MNIST: The MNIST database consists of 60,000 training examples and 10,000 test examples of handwritten digits. This dataset was originally created by Yann LeCun [ref 4].
  • IRIS: This dataset includes 150 instances, divided into three classes of 50 instances each, with each class representing a different species of iris plant [ref 5].

class DataSrc(Enum):
    MNIST = 'mnist'
    IRIS = 'iris'

First, we implement a data set loader for these two data sets using sklearn datasets module.

from sklearn.datasets import load_digits, load_iris


class DatasetLoader(object):
   def __init__(self, dataset_src: DataSrc) -> None:
       match dataset_src:
            case DataSrc.MNIST:
               digits = load_digits()
               self.data = digits.data
               self.color = digits.target.astype(int)
             
             case DataSrc.IRIS:
               images = load_iris()
               self.data = images.data
               self.names = images.target_names
               self.color = images.target.astype(int)
        self.dataset_src = dataset_src



t-SNE

Evaluation code

Let's wrap the evaluation of t-SNE algorithm for IRIS and MNIST data in a class TSneEval. The constructor initializes the Scikit-learn class TSNE with the appropriate number of components (dimension 2 or 3). The class TSneEval inherits for sklearn data loaders by subclassing DatasetLoader.

class TSneEval(DatasetLoader):
   def __init__(self, data_src: DataSrc, n_components: int) -> None:
        super(TSneEval, self).__init__(data_src)

        # Instantiate the Sklearn t-SNE model
        self.t_sne = TSNE(n_components=n_components)

The evaluation method, __call__, utilizes the Matplotlib library to visualize data clusters. It generates plots to display the clusters for each digit in the MNIST dataset and for each flower category in the Iris dataset.

def __call__(self, cmap: AnyStr) -> NoReturn:
     import matplotlib.pyplot as plt

     embedding = self.t_sne.fit_transform(self.data)
     x = embedding[:, 0]
     y = embedding[:, 1]
     n_ticks = 10

     plt.scatter(x=x, y=y, c=self.color, cmap=cmap, s=4.0)
     plt.colorbar(
        boundaries=np.arange(n_ticks+1) - 0.5).set_ticks(np.arange(n_ticks)
     )
     plt.title(f'UMAP {self.dataset_src} {self.umap.n_neighbors} neighbors'
                 f'min_dist: {self.umap.min_dist}')
     plt.show()


Output

The simple previous code snippet produces the t-SNE plots for MNIST and IRIS data sets.

MNIST dataset
First let's look at the 10 data clusters (1 per digit) in a two and 3 dimension plot.

Fig. 1  2-dimension t-SNE for MNIST data set


Fig. 2  3-dimension t-SNE for MNIST data set


IRIS dataset
The data related to each of the 3 types or Iris flowers are represented in 2 and 3 dimension plots

Fig. 3  2-dimension t-SNE for IRIS data set


Fig. 4  3-dimension t-SNE for IRIS data set


UMAP

Evaluation

We follow the same procedure for UMAP visualization as we do for t-SNE. The constructor of the UMAPEval wrapper class initializes the UMAP model with the specified number of neighbors (n_neighbors) and minimum distance (min_dist) parameters, which are used to represent the data.

import umap

class UMAPEval(DatasetLoader):
    def __init__(self, 
                        dataset_src: DataSrc, 
                        n_neighbors: int, 
                       min_dist: float) -> None:
       super(UMAPEval, self).__init__(dataset_src)

        # Instantiate the UMAP model
        self.umap = umap.UMAP(random_state=42, 
                                                 n_neighbors=n_neighbors, 
                                                 min_dist=min_dist)

Similar to the t-SNE evaluation, the __call__ method uses the Matplotlib library to visualize data clusters.

def __call__(self, cmap: AnyStr) -> NoReturn:
    embedding = self.umap.fit_transform(self.data)

    plt.scatter(
         x =embedding[:, 0], 
         y =embedding[:, 1], 
         c =self.color, 
         cmap =cmap, 
         s=4.0)
        .......  // Similar code as t-SNE
    plt.show()


Output


MNIST dataset

Fig. 5  UMAP plot for MNIST data  with 8 neighbors and min distance 0.3


Fig. 6 UMAP plot for MNIST data with 8 neighbors and min distance 0.6


Fig. 7 UMAP plot for MNIST data with 4 neighbors and min distance 0.8



IRIS dataset

Fig. 8  UMAP plot for IRIS data with 24 neighbors and min distance 0.005


Fig. 9 UMAP plot for IRIS data with 24 neighbors and min distance 0.001


Fig. 10 UMAP plot for IRIS data with 40 neighbors and min distance 0.001


Tuning

Number of neighbors
The parameter that defines the number of neighbors in UMAP, n_neighbors, determines the balance between global and local distances in the data visualization. A smaller number of neighbors means the local neighborhood is defined by fewer data points, which is ideal for detecting small clusters and capturing fine details. On the other hand, a larger number of neighbors helps in capturing broader, global patterns, but it may oversimplify the local relationships, potentially missing finer details in the data.

min_dist (compactness in low dimension)
This parameter plays a crucial role in determining the appearance of the low-dimensional representation, specifically affecting the clustering and spacing of data points.
A small min_dist value allows UMAP to pack points closer together in the low-dimensional space. It emphasizes the preservation of local structure and can make clusters more distinct.
A larger min_dist value enforces a minimum separation between points in the low-dimensional space. UMAP visualizations with a large min_dist  may show less clustering and more continuous distributions of data points.
The choice of min_dist thus influences the trade-off between preserving local versus global structures.