Fine-Tuning FTW!

We fine-tuned our first AI model and it was magical!

We wanted to feed unstructured legal contracts into a model and extract structured data to enter into our database, instead of having to do that manually.

Seemed like a perfect job for AI.

We started by iterating on our prompt to see how far we could get with that approach, including using few-shot prompting by including a few examples of inputs and outputs in the prompt itself.

We got pretty far with the prompt and the few-shot prompt, but not far enough.

We were able to get it to work for the more standard contracts most of the time, but it never got close to 100% success on 100% of the contracts.

So, we moved on to fine-tuning.

The most time-consuming part by far was preparing the data to fine-tune the model. The rest of the process was pretty straightforward and quick.

We started fine-tuning with 10 examples and saw a lot of improvement. And with only 25 examples, we had hit our goal.

The tools for building AI products are great and still improving quickly. If you want to build something, no reason to wait and no better way to learn that just getting started!

Some definitions from ChatGPT:

“Few-shot prompting is a technique used in natural language processing (NLP) where an AI model is provided with a small number of examples (or “shots”) to guide it in generating a response to a given task. In this context, “few-shot” refers to the model being shown only a few examples of input-output pairs, after which it can generalize from those examples to complete similar tasks.”

“Fine-tuning an AI model is the process of taking a pre-trained model (typically one that has been trained on a large dataset) and further training it on a smaller, task-specific dataset. The goal is to adapt the general knowledge of the pre-trained model to a more specialized task without starting the training process from scratch.”

“Training loss in fine-tuning refers to a metric that measures how well a model is performing during the process of fine-tuning. It quantifies the difference between the model’s predictions and the actual target values (or ground truth) for a given dataset. The goal of minimizing training loss is to improve the model’s ability to generalize and make accurate predictions.”

Share this:

Related

Leave a comment Cancel reply