This post introduces the concept generative workflow by extending LangChain with typed input and condition on input values.The post assumes the reader is somewhat has some basic knowledge of Python, OpenAI ChatGPT and LangChain.
Table of contents
IntroductionNotes:
- As I publish this post, OpenAI releases a new version of gpt-3.5-turbo that supports functions with typed input and output (ChatGPT functions)[ref 3]
- The code snippets uses Python 3.9 and LangChain 0.0.200
- To enhance the readability of the algorithm implementations, we have omitted non-essential code elements like error checking, comments, exceptions, validation of class and method arguments, scoping qualifiers, and import statements.
- This post is not generated by ChatGPT but assumes the reader is already familiar with Large Language Model
- The source code is available on GitHub https://github.com/patnicolas/chatgpt-patterns
Introduction
The LangChain Python framework built on OpenAI API to build large language models applications. The framework organizes ChatGPT API functionality into functional components, similar to Object-Oriented design). These components are assembled into customizable sequence or chains that can be dynamically configured. It allows developers to sequence of tasks (chains) with message/prompt as input (role=user) and answer (role=assistant) as output [ref 4]. This concept is analog to traditional function call
def func(**kwargs: dict[str, any]) -> output_type:
....
return x
Python | LLM |
---|---|
Function call | LLM message/request |
Function name (func) | Prompt prefix |
Argument (**kwargs) | List of tuple (variable_name, variable_type, condition) |
Returned type (output_type) | LangChain output key |
LangChain does not explicitly support types such as integer, dictionary, float.. in input messages. The next section extends LangChain functionality by adding types in ChatGPT request messages and given the data type, a condition or filter on the variable.
Example:
- prompt prefix "Compute the sum of elements of an array"
- Arguments: (x, list[float], element > 0.5)
generate the following prompt. "Compute the sum of the elements of an array x of type list[float] for which elements > 0.5"
The next section describes the Python implementation of a workflow of typed chains for ChatGPT using LangChain framework.
Generative workflow
The first step is to install LangChain Python module and setup the OpenAI API key as an environment variable of target machine. The LangChain Quickstart guide [ref 5] is very concise and easy to follow so there is no need to duplicate the information in this post.
LLM chains and sequence are important functions of LangChain framework. They allow developers to build sequence of chains. It allows developers to assemble basic function, LLMChain into fully functional workflow or sequence (type SequentialChain)
Let's extend SequentialChain with typed and condition on input values by implemented a workflow is defined by the class ChatGPTTypedChains. The constructor has two arguments:
- Temperature, _temperature, to initialize the ChatGPT request
- Optional task definition, task_builder that define the task implemented as a chain
- task description (prompt)
- List of input variables defined as tuple (name variables, data type, and optional condition of the variable
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import SequentialChain, LLMChain
from collections.abc import Callable
"""
This class extends the langchain sequences by defining explicitly
- Task or goal of the request to ChatGPT
- Typed arguments for the task
The components of the prompt/message
- Definition of the task (i.e. 'compute the exponential value of')
- Input variables defined as tuple (name, type, condition) (i.e. 'x', 'list[float]', 'value < 0.8')
"""
class ChatGPTTypedChains(object):
def __init__(self, _temperature: float, task_builder: Callable[[str, list[(str, str, str)]], str] = None):
"""
Constructor for the typed sequence of LLM chains
@param task_builder Builder or assembler for the prompt with {task definition and
list of arguments {name, type, condition} as input and prompt as output
@param _temperature Temperature for the softmax log probabilities
@type _temperature: floating value >= 0.0
"""
self.chains: list[LLMChain] = []
self.llm = ChatOpenAI(temperature=_temperature)
self.task_builder = task_builder if task_builder else ChatGPTTypedChains.__build_prompt
self.input_keys: list[str] = []
- Task descriptor, task_definition (prompt)
- Task parameters, arguments as tuple (name of input, type of input and optional condition)
- Return type, as _output_key
def append(self, task_definition: str, arguments: list[(str, str, str)], _output_key: str) -> int:
"""
Add a new task (LLM chain) into the current workflow...
@param _output_key: Output key or variable
@param task_definition: Definition or specification of the task
@type arguments: List of tuple (variable_name, variable_type, variable_condition)
"""
# We initialize the input variables for the workflow
if len(self.input_keys) == 0:
self.input_keys = [key for key, _, _ in arguments]
# Build the prompt for this new prompt
this_input_prompt = ChatGPTTypedChains.__build_prompt(task_definition, arguments)
this_prompt = ChatPromptTemplate.from_template(this_input_prompt)
# Create a new LLM chain and add it to the sequence
this_llm_chain = LLMChain(llm=self.llm, prompt=this_prompt, output_key=_output_key)
self.chains.append(this_llm_chain)
return len(self.chains)
@staticmethod
def __build_prompt(task_definition: str, arguments: list[(str, str, str)]) -> str:
def set_prompt(var_name: str, var_type: str, var_condition: str) -> str:
prompt_variable_prefix = "{" + var_name + "} with type " + var_type
return prompt_variable_prefix + " and " + var_condition \
if not bool(var_condition) \
else \
prompt_variable_prefix
embedded_input_vars = ", ".join(
[set_prompt(var_name, var_type, var_condition) \ \
for var_name, var_type, var_condition in arguments]
)
return f'{task_definition} {embedded_input_vars}'
The method __call__ implements the workflow as a LangChain sequence chain. This method takes two arguments: Input to the workflow (input to the first task) _input_values and the name/keys for the output values (output from the last task in the sequence).
def __call__(self, _input_values: dict[str, str], output_keys: list[str]) -> dict[str, any]:
"""
Execute the sequence of typed task (LLM chains)
@param _input_values: Input values to the sequence
@param output_keys: Output keys for the sequence
@return: Dictionary of output variable -> values
"""
chains_sequence = SequentialChain(
chains=self.chains,
input_variables=self.arguments,
output_variables=output_keys,
verbose=True
)
return chains_sequence(_input_values)
Simple use cases
- Two numerical tasks (math functions: sum and exp)
- Term frequency-Inverse document frequency (TF-IDF) scoring and ordering task
Numerical computation chain
- The sum of an array x of type list[float] or which values < 0.8
- Apply the exponential function to the sum
In this particular example, an array of 120 floating point values are generated through a sin function then filter through the condition x < 0.8. The output value is a dictionary with a single key 'u'.
def numeric_tasks() -> dict[str, str]:
import math
chat_gpt_seq = ChatGPTTypedChains(0.0)
# First task: implement lambda x: sin(x*0.001)
input_x = ','.join([str(math.sin(n * 0.001)) for n in range(128)])
chat_gpt_seq.append("Sum these values ", [('x', 'list[float]', 'values < 0.8')], 'res')
,
# Second task: function u: exp(sum(x))
chat_gpt_seq.append("Compute the exponential value of ", [('res', 'float', '')], 'u')
input_values = {'x': input_x}
output: dict[str, str] = chat_gpt_seq(input_values, ["u"])
return output
TF-IDF score
This second use case consists of two tasks (LLM chains)
- Computation of TF-IDF score, tf_idf_score of terms extracted from 3 documents/files (file1.txt, file2.txt, file3.txt). The key for input values, documents, is the content of the 3 documents.
- Ordering the items by their TF-IDF score. The output key, ordered_list is the list of terms ranked by their decreasing TF-IDF score.
def load_content(file_name: str) -> str:
with open(file_name, 'r') as f:
return f.read()
def load_text(file_names: list[str]) -> list[str]:
return [load_content(file_name) for file_name in file_names]
def tf_idf_score() -> str:
chat_gpt_seq = ChatGPTTypedChains(0.0)
# Load documents for which TF-IDF score has to be computed
input_files = ['../input/file1.txt', '../input/file2.txt', '../input/file2.txt']
input_documents = '```'.join(load_text(input_files))
# Create first task: Compute the
chat_gpt_seq.append(
"Compute the TF-IDF score for words from documents delimited by triple backticks with output format term:TF-IDF score ```",
[('documents', 'list[str]', '')], 'terms_tf_idf_score')
# Create a second task
chat_gpt_seq.append("Sort the terms and TF-IDF score by decreasing order of TF-IDF score",
[('terms_tf_idf_score', 'list[float]', '')], 'ordered_list')
output = chat_gpt_seq({'documents': input_documents}, ["ordered_list"])
return output['ordered_list']
References
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning" Packt Publishing ISBN 978-1-78712-238-3
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.