by Bram Vanroy
Last Updated May 02, 2018 08:19 AM - source

I am a bit confused about how to interpret correlation coefficient results. I am aware that there are numerous questions about the differences between Pearson, Spearman, and Kendall, but I am more interested in their respective relationship to linearity.

Let's assume that Pearson's `r`

is `0.578`

and `p`

is `0.000012`

. There is a correlation between two variables that is most probably not caused by chance. However, this assumes **linearity** (and homoscedasticity) and prone to errors when the data contains outliers.

Let's also assume that we draw a scatter plot and find that, indeed, there are outliers in our data. To minimise the effect of outliers, we run a Kendall test. Here we also find a small `p`

and a positive `tau`

. Kendall (and Spearman) do **not** assume linearity, hence their effectiveness when dealing with outliers. But what are the consequences for trying to fit the data on a line?

If we have normally distributed, linear data (ideal case for Pearson) we can fit all data points on a linear cure (cf. for instance `regplot()`

of the Python package `seaborn`

). But if we have outliers, and Pearson is not a viable option, is there still any assumption for linearity with Kendall or Spearman? Does it still make sense to try and fit the data on a linear curve, or any curve for that matter? Or does the relationship as defined by Kendall or Spearman does not say anything about the fitting of the data, meaning that it does not make sense to try and plot the data on a curve?

- Serverfault Help
- Superuser Help
- Ubuntu Help
- Webapps Help
- Webmasters Help
- Programmers Help
- Dba Help
- Drupal Help
- Wordpress Help
- Magento Help
- Joomla Help
- Android Help
- Apple Help
- Game Help
- Gaming Help
- Blender Help
- Ux Help
- Cooking Help
- Photo Help
- Stats Help
- Math Help
- Diy Help
- Gis Help
- Tex Help
- Meta Help
- Electronics Help
- Stackoverflow Help
- Bitcoin Help
- Ethereum Help