Is there an alternative to categorical cross-entropy with a notion of "class distance"?

by boomkin   Last Updated September 20, 2018 13:19 PM

I have a signal $ x \in \mathbb{R}^{t \times l} $ which is discretized into $ l = 32 $ levels for $ t = 100000 $ time points. This enables me to turn a regression problem into a classification problem, which is more tractable mathematically for my application.

I understand that classification problem should be done with categorical cross-entropy, and I realise that at any time point $t$, there is one level which is 1 In $ x $, so probably sparse categorical cross-entropy would improve it.

However, in this problem setting, making $ l = 15 $ to 1 instead of $ l = 16 $ is not as bad as setting $ l = 1 $ to 1, as these levels have a natural order.

Is there any way to incorporate this information into the loss function?

I looked at the Wasserstein-distance metric, but I'm not reasonably advanced in mathematics to know if it has any closed from loss functions for my classes, but as far as I understand that would do something similar.

Related Questions

Cross Entropy Loss for One Hot Encoding

Updated November 20, 2018 16:19 PM

Gradient of the cross entropy loss function

Updated November 25, 2018 11:19 AM