Cross Entropy Loss is a loss function that is commonly used in classification tasks.
Before talking about Cross Entropy Loss, it is essential to first talk about the process of classification.
Classification
The goal of classification is to predict the class of a data point. The output of a classification model is usually a vector of values for each input. Each value of the vector is a logit, or an unnormalized score that reflects the model's confidence for that specific class.
The softmax function is then applied to turn these unnormalized scores into probabilities that sum to one. In other words, it nonlinearizes the output.
The equation for the softmax function is as follows:
In this equation:
- The numerator exponentiates the score for class $y$ in order to:
- Ensure that it is positive
- Magnify the difference
- The denominator sums the exponentiates score for all possible classes in order to ensure that the output is a normalized probability value.
Cross Entropy Loss
The cross entropy loss function is as follows:
In the equation:
- The log function is applied to the *probability calculated through the softmax functio in order to magnify the loss when the values are far from one. This polarization encourages the model to make more accurate predictions.
Therefore, the model should be trained to maximize the probabilities of the correct classes for the input values in order to minimize the cross entropy loss.