Estimated reading time: 7'
- Environment: Python 3.11, Numpy 1.26.4, Geomstats 2.7.0, Matplotlib 3.8.2
- This article is a follow up to Foundation of Geometric Learning
- Source code available at Github.com/patnicolas/Data_Exploration/manifolds
- To enhance the readability of the algorithm implementations, we have omitted non-essential code elements like error checking, comments, exceptions, validation of class and method arguments, scoping qualifiers, and import statements.
Geometric learning
- Understanding data manifolds: Data in high-dimensional spaces often lie on lower-dimensional manifolds. Differential geometry provides tools to understand the shape and structure of these manifolds, enabling generative models to learn more efficient and accurate representations of data.
- Improving latent space interpolation: In generative models, navigating the latent space smoothly is crucial for generating realistic samples. Differential geometry offers methods to interpolate more effectively within these spaces, ensuring smoother transitions and better quality of generated samples.
- Optimization on manifolds: The optimization processes used in training generative models can be enhanced by applying differential geometric concepts. This includes optimizing parameters directly on the manifold structure of the data or model, potentially leading to faster convergence and better local minima.
- Geometric regularization: Incorporating geometric priors or constraints based on differential geometry can help in regularizing the model, guiding the learning process towards more realistic or physically plausible solutions, and avoiding overfitting.
- Advanced sampling techniques: Differential geometry provides sophisticated techniques for sampling from complex distributions (important for both training and generating new data points), improving upon traditional methods by considering the underlying geometric properties of the data space.
- Enhanced model interpretability: By leveraging the geometric structure of the data and model, differential geometry can offer new insights into how generative models work and how their outputs relate to the input data, potentially improving interpretability.
- Physics-Informed Neural Networks: Projecting physics law and boundary conditions such as set of partial differential equations on a surface manifold improves the optimization of deep learning models.
- Innovative architectures: Insights from differential geometry can lead to the development of novel neural network architectures that are inherently more suited to capturing the complexities of data manifolds, leading to more powerful models.
Differential geometry basics
Geomstats library
- geometry: This part provides an object-oriented framework for crucial concepts in differential geometry, such as exponential and logarithm maps, parallel transport, tangent vectors, geodesics, and Riemannian metrics.
- learning: This section includes statistics and machine learning algorithms tailored for manifold data, building upon the scikit-learn framework.
Use case: Hypersphere
Components
- id A label a point
- location A n--dimension Numpy array
- tgt_vector An optional tangent vector, defined as a list of float coordinate
- geodesic A flag to specify if geodesic has to be computed.
- intrinsic A flag to specify if the coordinates are intrinsic, if True, or extrinsic if False.
@dataclass
class ManifoldPoint:
id: AnyStr
location: np.array
tgt_vector: List[float] = None
geodesic: bool = False
intrinsic: bool = False
import geomstats.visualization as visualization
from geomstats.geometry.hypersphere import Hypersphere, HypersphereMetric
from typing import NoReturn, List
import numpy as np
import geomstats.backend as gs
class HypersphereSpace(GeometricSpace):
def __init__(self, equip: bool = False, intrinsic: bool=False):
dim = 2
super(HypersphereSpace, self).__init__(dim, intrinsic)
coordinates_type = 'intrinsic' if intrinsic else 'extrinsic'self.space = Hypersphere(dim=self.dimension, equip=equip, default_coords_type=coordinates_type)
self.hypersphere_metric = HypersphereMetric(self.space)
def belongs(self, point: List[float]) -> bool:
return self.space.belongs(point)
def sample(self, num_samples: int) -> np.array:
return self.space.random_uniform(num_samples)
def tangent_vectors(self, manifold_points: List[ManifoldPoint]) -> List[np.array]:
def geodesics(self,
manifold_points: List[ManifoldPoint],
tangent_vectors: List[np.array]) -> List[np.array]:
def show_manifold(self, manifold_points: List[ManifoldPoint]) -> NoReturn:
- belongs to test if a point belongs to the hypersphere
- sample to generate points on the hypersphere using a uniform random generator
Tangent vectors
def tangent_vectors(self, manifold_points: List[ManifoldPoint]) -> List[np.array]:
def tangent_vector(point: ManifoldPoint) -> (np.array, np.array):
import geomstats.backend as gs
vector = gs.array(point.tgt_vector)
tangent_v = self.space.to_tangent(vector, base_point=point.location)
end_point = self.hypersphere_metric.exp( # 2
tangent_vec=tangent_v,
base_point=point.location)
return tangent_v, end_point
return [self.tangent_vector(point) for point in manifold_points] # 1
manifold = HypersphereSpace(True)
# Uniform randomly select points on the hypersphere
samples = manifold.sample(3)
# Generate the manifold data points
manifold_points = [
ManifoldPoint(
id=f'data{index}',
location=sample,
tgt_vector=[0.5, 0.3, 0.5],
geodesic=False) for index, sample in enumerate(samples)]
# Display the tangent vectors
manifold.show_manifold(manifold_points)
Geodesics
def geodesics(self,
manifold_points: List[ManifoldPoint],
tangent_vectors: List[np.array]) -> List[np.array]:
def geodesic(manifold_point: ManifoldPoint, tangent_vec: np.array) -> np.array:
return self.hypersphere_metric.geodesic(
initial_point=manifold_point.location,
initial_tangent_vec=tangent_vec
)
return [geodesic(point, tgt_vec)
for point, tgt_vec in zip(manifold_points, tangent_vectors) if point.geodesic]
References
Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning.
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning", Packt Publishing ISBN 978-1-78712-238-3 and Geometric Learning in Python Newsletter on LinkedIn.
Appendix
import geomstats.visualization as visualization
def show_manifold(self,
manifold_points: List[ManifoldPoint],
euclidean_points: List[np.array] = None) -> NoReturn:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111, projection="3d")
# Walk through the list of data point on the manifold
for manifold_pt in manifold_points:
ax = visualization.plot(
manifold_pt.location,
ax=ax,
space="S2",
s=100,
alpha=0.8,
label=manifold_pt.id)
# If the tangent vector has to be extracted and computed
if manifold_pt.tgt_vector is not None:
tgt_vec, end_pt = self.__tangent_vector(manifold_pt)
# Show the end point and tangent vector arrow
ax = visualization.plot(end_pt, ax=ax, space="S2", s=100, alpha=0.8, label=f'End {manifold_pt.id}')
arrow = visualization.Arrow3D(manifold_pt.location, vector=tgt_vec)
arrow.draw(ax, color="red")
# If the geodesic is to be computed and displayed
if manifold_pt.geodesic:
geodesics = self.__geodesic(manifold_pt, tgt_vec)
# Arbitrary plot 40 data point for the geodesic from the tangent vector
geodesics_pts = geodesics(gs.linspace(0.0, 1.0, 40))
ax = visualization.plot(
geodesics_pts,
ax=ax,
space="S2",
color="blue",
label=f'Geodesic {manifold_pt.id}')
# Display points in Euclidean space of Hypersphere if any specified if euclidean_points is not None:
for index, euclidean_pt in enumerate(euclidean_points):
ax.plot(
euclidean_pt[0],
euclidean_pt[1],
euclidean_pt[2],
**{'label': f'E-{index}', 'color': 'black'},
alpha=0.5)
ax.legend()
plt.show()