Training Protocols

April 7, 2024

The correct way to train a model often involves several standard protocols.

Dataset

A standard dataset is often divided into three parts:

  • Training set: This is the part of the dataset that will be used in training to optimize the model's trainable parameters. This usually accounts for the majority of the total dataset.
  • Test set: This is the part of the dataset that is used to evaluate the performance of the model.
  • Validation set: This is the part of the dataset that is used to determine the best hyperparameters, such as the learning rate.
Training

Training is usually divided into epochs. Each epoch represents a full run-through of all the training samples or batches.

Usually, as the number of epochs increases, the training loss will go down at a steady rate. However, the validation loss will reach a minimum and then start increasing due to overfitting.

![[Pasted image 20241101121807.png]]

One way to prevent this is through early stopping. It is a technique that finds the optimal point to halt the training process, which is the point where validation loss begins to increase.

Patience is a hyperparameter that determines how many epochs the model should continue training after the validation loss stops decreasing. This should be determined through trial and error with the validation set.

Fine tuning

Fine tuning refers to the process of adapting a pre-trained model for a downstream task. The upstream task is the task that the model was originally trained for.

Fine tuning is commonly used for two reasons:

  • The amount of data for the downstream task is very limited. If the downstream task shows similarities with an upstream task with more abundant data, then fine tuning is an appropriate choice.
    • One example of this would be fine-tuning neural machine translation models for low-resource languages.
  • To limit training costs by reusing patterns already encoded into an existing model.

Fine-tuning is now especially prevalent with LLMs and foundational models.