by Max Ghenis
Last Updated November 09, 2018 01:19 AM - source

I'm synthesizing data trained from a source dataset, and am looking for a loss function to compare different data synthesis methods*. I have some ideas below, but each has drawbacks and none is very elegant. Is there an established loss function to compare high-dimensional joint distributions?

Here are my ideas, but all look just at one variable at a time without considering the joint nature explicitly, so would have to be evaluated across strata.

**MSE:**Just compares means, without considering distributions.**Kolmogorov-Smirnov D statistic:**e.g. an average of each summed. Doesn't consider the full distribution.**Deviations from quantiles:**e.g. for some set of equally-spaced quantiles. Captures more of the distribution.

Another idea could be something like cosine distance, matching each synthetic record with its nearest real record.

* The loss function could be zero when passed the real data, so I'm separately checking to ensure that no synthetic record exactly matches a real one.

- Serverfault Help
- Superuser Help
- Ubuntu Help
- Webapps Help
- Webmasters Help
- Programmers Help
- Dba Help
- Drupal Help
- Wordpress Help
- Magento Help
- Joomla Help
- Android Help
- Apple Help
- Game Help
- Gaming Help
- Blender Help
- Ux Help
- Cooking Help
- Photo Help
- Stats Help
- Math Help
- Diy Help
- Gis Help
- Tex Help
- Meta Help
- Electronics Help
- Stackoverflow Help
- Bitcoin Help
- Ethereum Help