Misused statistical concepts

I have come across these basic truths of statistics several times in my learning experience. Nevertheless, I keep reading press articles and hearing news reports where they are ignored, to the purpose of telling stories that are all but true.

The idea crossed my mind already several times, but today I was reading a post on “quora.com” that was really spot-on. So, what are the biggest statistical misconceptions? Here we go.

  • Correlation is not Causation: By comparing statistical variables with similar trends (may be over similar periods) and finding that they correlate you may be incline to think that there are relations. Instead, we are looking at a “statistical mantra”. When two variables correlate, it does not mean that one “causes” the other. And in any case, when two variables are correlated this is a symmetric relation. Therefore we should be asking ourselves does one of these cause the other? Example how you can get it wrong: Children who get tutored get worse grades than children who do not get tutored. Another one, from https://pubs.acs.org/doi/abs/10.1021/ci700332k:

Is it not obvious to you that there must be something wrong when comparing these two quantities? Highway fatality rate must be due to other factors, like improved road safety may be? Better cars? This makes me laugh all the time!

  • Statistical significance: Actually here it depends all on how we have collected the sample data. If we get this basic part wrong, the bigger the sample, the bigger the error (while we think that we are getting close to denying the null hypotheses we may actually be closer and closer to confirm it if we had done proper data collection).
  • Independence of variables being studied: Are they really independent? This is a question that almost nobody asks, except when it is a bit too late and all the data have been collected (and again in a bad way).
  • Collection of samples thinking about the desired result: Are you sure that the data you have been looking at have been collected with an unbiased method? Are the results or the fact that is under scrutiny not affecting your data selection? This is at the basis of many false positives in scientific studies.
  • Pearson Correlation works only for linear relations: It cannot be applied to all sort of things. But people seem to ignore it…
  • Not everything follows a Gaussian distribution although sometimes we would like it: This one can be found especially in the financial institutions.