Can findings based on low values be considered valid?

by Pascal   Last Updated September 19, 2018 22:19 PM - source

Lets say i do have a TimeSeries with N > 300 but values like [0, 1, 2, 1, 2, 0, 2, ...] representing the visitors count of a website per day.

Since there are only few visitors and each of them can be considered an individual, is there a way to "prove" or maybe some statement in literature that these values are too low for e.g. prediction with random forest or simply a correlation with other, better performing websites? E.g. if there is a high correlation based on higher visitor counts on mondays, can this correlation actually be considered valid?

Or more specifically: can p < 0.05 as received from scipy.stats.pearsonr actually be considered valid, even if the values of one of the input-arrays are low?

Additionaly, lets say i did some SEO and my visitors count mean improves by 400%, the actual values will still be low and could still be based on random effects, or am i getting this wrong?

Kind regards,


Related Questions

Python: Goodness of fit for Discrete Distributions

Updated March 02, 2016 04:08 AM

can't make sense of scipy.lognorm

Updated May 03, 2018 03:19 AM