Fine Tuning OpenAI’s GPT3 Model

OpenAI’s GPT-3 models have taken the world by storm with their impressive capabilities straight out of the box. However, it’s important to note that these models are primarily generalized and may require additional training to excel in specific use cases. While the base models perform well with well-crafted prompts and contextual examples, fine-tuning can eliminate the need for providing such examples and produce even better results.

Photo credit: Jonathan Kemper on Unsplash

Introduction

OpenAI’s GPT-3 models have taken the world by storm with their impressive capabilities straight out of the box. However, it’s important to note that these models are primarily generalized and may require additional training to excel in specific use cases. While the base models perform well with well-crafted prompts and contextual examples, fine-tuning can eliminate the need for providing such examples and produce even better results.

The benefits are:

  • Fine-tuning GPT-3 models can improve the quality of results compared to the base model.

  • It can result in cost savings by reducing the length of prompts required.

  • Shorter prompts can lead to lower latency requests, making the model more efficient.

  • Fine-tuning allows the model to better understand the task without requiring large amounts of contextual data.

  • Overall, fine-tuning can enhance the performance of GPT-3 models for specific use cases, making them more accurate, efficient, and easier to train.

In this article, we’ll explore how OpenAI’s APIs can simplify the process of fine-tuning a model of your choice for specific use cases. With the help of these APIs, you can quickly and easily fine-tune a model and use it for inference right away, without requiring extensive coding knowledge or technical expertise. We’ll delve into the step-by-step process of fine-tuning and demonstrate how it can enhance the performance of GPT-3 models for niche tasks.

Setting up for the experiment

The only library that we need to install for fine-tuning and inference is the openai Python library which we can install via pip as follows -

pip install --upgrade openai

We also would need an OpenAI API Key which we can generate from their dashboard. The key is then assigned to an environment variable as follows -

export OPENAI_API_KEY="<OPENAI_API_KEY>"

Preparing the dataset

In this experiment, we are going to predict whether the headline of a news article is “Clickbait” or “Not Clickbait”. The dataset we are using can be found here and contains 32K rows half of which are labeled as “Clickbait”.

Preparing the dataset

We have renamed the columns “headline” as “prompt” and “clickbait” as “completion” so as to meet the API requirements. The column “clickbait” was a boolean column in which 1 was replaced by “Clickbait” and 0 with “Not Clickbait”. The distribution of classes is as follows -

Preparing the dataset 2

The API requires the data to be in JSONL format with two keys in each entry - “prompt” and “completion”. To convert the above Pandas dataframe into JSONL we run the following Python script -

df.to_json("./data/clickbait.jsonl", orient='records', lines=True)

This creates the clickbait.jsonl file with entires that look like

{
   "prompt":"Should I Get Bings",
   "completion":"Clickbait"
}
{
   "prompt":"Which TV Female Friend Group Do You Belong In",
   "completion":"Clickbait"
}
{
   "prompt":"The New \"Star Wars: The Force Awakens\" Trailer Is Here To Give You Chills",
   "completion":"Clickbait"
}

The training and validation sets are generated by running the following command -

openai tools fine_tunes.prepare_data -f ./data/clickbait.jsonl -q

training and validation sets

Model Fine-Tuning

We are going to use the ada model for further fine-tuning. Is is because it is the fastest to train in comparison to the other models. A fine-tuning job is created by running the following command -

openai api fine_tunes.create -t "./data/clickbait_prepared_train.jsonl" -v "./data/clickbait_prepared_valid.jsonl" --compute_classification_metrics --classification_positive_class " Clickbait" -m ada

To check the status of the job, we can run -

openai api fine_tunes.follow -i <FINE-TUNING-ID>

check finetune job data

Once the training is complete, we run the following command to get predictions -

openai api fine_tunes.results -i <FINE-TUNING-ID> > result.csv

Model Evaluation

The results.csv contains the following columns -

  • step
  • elapsed_tokens
  • elapsed_examples
  • training_loss
  • training_sequence_accuracy
  • training_token_accuracy
  • validation_loss
  • validation_sequence_accuracy
  • validation_token_accuracy
  • classification/accuracy
  • classification/precision
  • classification/recall
  • classification/auroc
  • classification/auprc
  • classification/f1.0

fine-tuning-data-1

This is how the progress of fine-tuning looked like during the first 9 steps. The last entry in this CSV gives us the evaluation metrics for the fine-tuned model. As we can see that we have reached 100% precision and over 99.7% of recall. Nice!

fine-tuning-data-2

Since the dataset is balanced, we could look at accuracy as a good evaluation metric. The accuracy of predictions on the dataset improved over time, as demonstrated by this figure -

Model accuracy during training

Model Accuracy during Traning

To check the various fine-tuned models and their metadata we can run the following command -

openai api fine_tunes.list

To run inference using our fine-tuned model we can directly call it via the Python API as demonstrated here -

ft_model = '<UPLOADED-MODEL-IDENTIFIER>'
res = openai.Completion.create(model=ft_model, prompt=prompt_text + '\n\n###\n\n', max_tokens=1, temperature=0)
res['choices'][0]['text']

Model Suite and Pricing

At the time of writing this article, OpenAI has priced fine-tuning their models as follows -

Fine Tuning Cost

Fine Tuning Cost [Source: openai.com]

To calculate the number of tokens present in our text, we can use the tiktoken library.

Conclusion

I hope this short tutorial was helpful is understanding how to fine-tune OpenAI’s GPT3 models. The same process applies to all the other model offerings and the standardized template helps us to get started and run inference in no time.

References

You may also be interested in:


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.