What Does The Implementation Of Keras.losses.sparse_categorical_crossentropy Look Like?
Solution 1:
It does not do anything special, it just produces the one-hot encoded labels inside the loss for a batch of data (not all data at the same time), when it is needed, and then discards the results. So its just a classic trade-off between memory and computation.
Solution 2:
The formula for categorical crossentropy is the following:
Where y_true
is the ground truth data and y_pred
is your model's predictions.
The bigger the dimensions of y_true
and y_pred
, more memory is necessary to perform all these operations.
But notice an interesting trick in this formula: only one of the neurons in y_true
is 1, all the rest are zeros!!! This means we can assume that only one term in the sum is non-zero.
What a sparse formula does is:
- Avoid the need to have a huge matrix for
y_true
, using only indices instead of one-hot encoding - Pick from
y_pred
only the column respective to the index, instead of performing calculations for the entire tensor.
So, the main idea of a sparse formula here is:
- Gather columns from
y_pred
with the indices iny_true
. - Calculate only the term
-ln(y_pred_selected_columns)
Post a Comment for "What Does The Implementation Of Keras.losses.sparse_categorical_crossentropy Look Like?"