Target audience: Beginner

Estimated reading time: 3'

This post describes a generic implementation of a client to the OpenAI ChatGPT REST web service. This implementation has been written and tested with Python 3.9. Comments in the source code are omitted for the sake of clarity.

ChatGPT API overview

HTTP parameters

Request body

Chat completion request

ChatGPT client

Post

References

Notes:

Environments: Python 3.9.16, ChatGPT 3.5
To enhance the readability of the algorithm implementations, we have omitted non-essential code elements like error checking, comments, exceptions, validation of class and method arguments, scoping qualifiers, and import statements.

ChatGPT API overview

Let's review the parameters for OpenAI Chat Completion REST API

HTTP Parameters

The connectivity parameters for the HTTP post are

OPEN_AI_KEY=xxxxx
URL= https://api.openai.com/v1/chat/completions
CONTENT_TYPE=application/json
AUTHORIZATION="Bearer ${OPEN_AI_KEY}"

Request body

The parameters of the POST content:

model: Identifier of the model (i.e. gpt-3.5-turbo)
messages: Text of the conversation
user: Identifier for the user
role: Role of the user {system|user|assistant}
content: Content of the message or request
name: Name of the author
temperature: Hyper-parameter that controls the "creativity" of the language model by adjusting the distribution (softmax) for the prediction of the next word/token. The higher the value (> 0) the more diverse the prediction (default: 0)
top_p: Sample the tokens with top_p probability. It is an alternative to temperature (default: 1)
n: Number of solutions/predictions (default 1)
max_tokens: Limit the number of tokens used in the response (default: Infinity)
presence_penalty: Penalize new tokens which appear in the text so far if positive. A higher value favors most recent topics (default: 0)
frequency_penalty: Penalize new tokens which appear in the text with higher frequency if the value is positive (default: 0)

Note: OpenAI models are non-deterministic, as identical requests may yield different answers. Setting temperature = 0 will make the outputs mostly deterministic.

Chat completion request

The first step is to implement the data structure for the content of the request as described in the previous section. The content of the request, ChatGPTRequest is a read-only and therefore implemented as a data class. The class members reflect the semantic of the OpenAI chat completion API.

model: Identifier of the model (i.e. gpt-3.5-turbo, code-davinci-002 ...)
role: Role of the user as system, user or assistant
temperature: Hyper-parameter that adjusts the distribution for the prediction of the next token
max_tokens: Limit the number of tokens used in the response
top_p: Sample the tokens with p highest probability (default 1)
n Number predictions/choices (default 1)
presence_penalty: Penalize new tokens which appear in the text so far
frequency_penalty: Penalize new tokens which appear in the text with higher frequency

from dataclasses import dataclass

@dataclass
class ChatGPTRequest:
   model: str
   role: str
   temperature: float
   max_tokens: int
   top_p: int
   n: int
   presence_penalty: int
   frequency_penalty: int

ChatGPT client

There are two constructors for the client of type ChatGPTClient

Default constructor for a fully customizable request ChatGPTRequest as argument.
Constructor build for simple request that required only model, role and temperature (evaluated with values 0 and 1) with all other parameters using the default values specified in the openAI API, openapi.ChatCompletion, documentation [ref 1]. We use the annotated type, InstanceType to specify the type of instance to create [ref 2]

from typing import Type, TypeVar, Callable

Instancetype = TypeVar('Instancetype', bound='ChatGPTClient')

class ChatGPTClient(object):
    import constants


      # static variable for the API key and the default maximum

      # number of tokens returned
    openai.api_key = constants.openai_api_key
    default_max_tokens = 1024

    def __init__(self, chatGPTRequest: ChatGPTRequest):
       self.chatGPTRequest = chatGPTRequest

   @classmethod
   def build(cls, model: str, role: str, temperature: float) -> Instancetype: 
       chatGPTRequest = ChatGPTRequest( 
           model, 
           role, 
           temperature,
           ChatGPTClient.default_max_tokens, 
           1, 
           1, 
           0, 
           0)
       return cls(chatGPTRequest)

Post

Let's start with a simple version of the invocation that returns only the answer without any explanation, status or usage. The only argument of the HTTP post is the user prompt. The response is extracted from the message of the first choice of the answer.

def post(self, prompt: str) -> str:     
   import logging

   try:
       response = openai.ChatCompletion.create(  
           model=self.chatGPTRequest.model,
           messages=[{'role': self.chatGPTRequest.user, 'content': prompt}],
           temperature=self.chatGPTRequest.temperature,
           max_tokens=self.chatGPTRequest.max_tokens
       )
       return response['choices'][0].message.content

   except  openai.error.AuthenticationError as e:
       logging.error(f'Failed as {str(e)}')

Some advanced client applications may require processing and evaluating some metadata included in the response. In this case, the JSON content of the Chat completion answer is objectified.

The key variables of the ChatCompletion response are

id: Conversation identifier
object: Payload of the response
created: Creation date
usage.prompt_tokens: Number of tokens used in the prompt
usage.completion_tokens: Number of tokens used in the completion
usage.total_tokens Total number of tokens
choices
choices.message.role Role used in the request
choices.message.content Response content
choices.finish_reason Description of the state of completion of the request

As with the request content, the difference components of the answer are implemented as data classes to reflect the structure of the JSON response from the ChatCompletion API.

The previous implementation of the post method is upgraded by adding a conversion of the JSON response into ChatGPTResponse.

@dataclass
class ChatGPTChoice:
   messages: []
   index: int
   finish_reason: str

@dataclass
class ChatGPTUsage:
   prompt_tokens: int
   completion_tokens: int
   total_tokens: int

@dataclass
class ChatGPTResponse:
   id: str
   object: str
   created: int
   model: int
   choices: []
   usage: ChatGPTUsage


def post_dev(self, prompt: str) -> ChatGPTResponse:
   import json
   import logging
        
   try:
      response = openai.ChatCompletion.create(
          model=self.chatGPTRequest.model,
          messages=[{'role': self.chatGPTRequest.user, 'content': prompt}],
          temperature=self.chatGPTRequest.temperature,
          max_tokens=self.chatGPTRequest.max_tokens
      )
      return json.loads(response)

   except  openai.error.AuthenticationError as e:
       logging.error(f'Failed as {str(e)}')

Note: We are using the built-in json Python library [ref 3] to decode the ChatGPT response. The list of alternative libraries include Orjson [ref 4] and SimpleJson [ref 5]

References

[1] OpenAI API

[2] `TypeVar`s explained

[3] Working With JSON Data in Python

[4] Orjson

[5] Simplejson library

---------------------------

Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning.
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning" Packt Publishing ISBN 978-1-78712-238-3

Saturday, January 21, 2023

ChatGPT API Python Client

ChatGPT API overview

HTTP Parameters

Request body

Chat completion request

ChatGPT client

Post

References

No comments:

Post a Comment

Contact Form

Equation Editor

Popular Posts