I am investigating the correlation factors for 13 categorical variables (1401 observations). To be able to build a correlation matrix, I attributed 1 or 0 to each variable depending on its level. I then constructed a matrix and calculated the correlation factors.

The problem is that some of my variables (events) occur very rarely compared to others and I think it gives me misleading results. I do not know how to account for the occurrence of each variable. Should I remove the most rare variables? Should I normalize the calculated correlation factor against the occurrence of the variable?

Thanks a lot, Emilie

So if I am reading your question correctly, you are getting a lot of {0} observations for some variables? This isn't a matter of misinformation or possible bias. If your variable illicits many false responses, say 95%, then this likely means that, assuming you have sufficient samples, the population P(0)≈ 0.95. It isn't missing data, just a false, or zero, binomial response.

You shouldn't have a problem with your correlation matrix. Lots of "zeroes" between categories just means that those variables have a low probability of occurrence (assuming you collected data correctly and set up the matrix accordingly) and that they are correlated in that they are both not correlating. Say for instance, you have a dataset of binomial occurrences of tsunamis, earthquakes (say above 7 mag), and tornadoes. This isn't a meteorology or seismology forum, but bear with me. Many tornadoes will happen without a single occurrence of the other two. Obviously, when a earthquake happens, tsunamis are very likely depending on the location. Despite having lots of "zeroes" from tsunamis and quakes, we have all the info we need; corr(ts, eq) will likely be closer to 1, while corr(ts, tor) and corr(eq,tor) will be virtually zero, barring any data being taken after the second coming of christ when all hell breaks loose.

July 12, 2019 07:52 AM

- Serverfault Help
- Superuser Help
- Ubuntu Help
- Webapps Help
- Webmasters Help
- Programmers Help
- Dba Help
- Drupal Help
- Wordpress Help
- Magento Help
- Joomla Help
- Android Help
- Apple Help
- Game Help
- Gaming Help
- Blender Help
- Ux Help
- Cooking Help
- Photo Help
- Stats Help
- Math Help
- Diy Help
- Gis Help
- Tex Help
- Meta Help
- Electronics Help
- Stackoverflow Help
- Bitcoin Help
- Ethereum Help