# The Seductiveness of Hans Rosling

The Seductiveness of Hans Rosling

Hans Rosling is an extremely fine presenter of data. His visualizations using Gapminder are excellent and very effective – sometimes perhaps seductive.

In his TED talk “The best stats you have ever seen” (2006) he shows a visualization of the percentage of the world population as a function of income per person per day. He maintains that the income gap has been decreasing and is disappearing. This depends on his definition of gap. If he means the dip/relative minimum in the curve he is right. But if gap means  income inequality between the poor and the rich then he not right. In fact income inequality has been increasing in recent years.

Hans Rosling exhorts all of us to use the enormous amount of data that exists for the benefit of all. He says:

“We need really to see them. We need to get them into graphic formats, where you can instantly understand them. Now, statisticians don’t like it, because they say that this will not show the reality; we have to have statistical, analytical methods.”

When Rosling says “instantly understand” I take him to mean “intuitively understand”.  He is on the verge of seducing us into accepting that the relationship/correlation between the variables he visualizes implies causation.

But then he seems to feel uncomfortable with this and says:

“Many people say data is bad. There is an uncertainty margin, but…. the differences (in the data I use) are much bigger than the weakness of the data.”

This is of course an application of statistical thinking and he finally escapes by the skin of his teeth from giving the impression that he thinks that correlation implies causation by saying:

“But this is hypothesis-generating.”

The visualizations that can be made with Gapminder are extremely fine and if you are not on your guard you can easily be seduced by them. The same applies to the equally fine visualizations made with Tableau.

# Correlation does not imply causation

When I began to use the new generation of data analysis and visualization software like Tableau I thought that I would first use them to address some of the most important problems of humanity like Resource Scarcity, Inequality, Poverty, Human Migration, Refugees, …

I have found large amounts of data relevant to these problems published to the Internet by various organizations and institutions, like the United Nations, The World Bank, The World Health Organization, …The data are usually in the form of data tables with countries, regions, and locations as rows, time periods as rows or columns, and variables as columns.

The data have been collected in surveys. The completeness of the data and their reliablity is uncertain and variable.

The presentations of the data in in the worksheets and dashboards of Tableau workbooks are very fine and I have no doubt that such presentations can increase the viewers knowledge and understanding of the problems. But in order to solve a problem it is necessary to identify, eliminate or minimize its cause or causes.

The presentations can be seductive. Viewers may be tempted to identify causes by calculating correlations between the variables in the data and assuming that correlations imply causation.

Statisticians know that correlation does not imply causation. What does this mean? Correlation is a measure of how closely two things are related. You may think of it as a number describing the relative change in one thing when there is a change in the other, with 1 being a strong positive relationship between two sets of numbers, -1 being a strong negative relationship and 0 being no relationship whatsoever. “Correlation does not imply causation” means that just because two things correlate one does not necessarily cause the other. Although this is an important fact most people do not sufficiently take this into account. Their preconceptions tempt them to leap from correlation to causation without sufficient evidence.

This can result in absurd and ridiculous causal claims. Tyler Vigen has recently published the second edition of his book “Spurious Correlations” (May 8, 2015).

He has designed software that scours enormous data sets to find spurious statistical correlations. In the Introduction to the book he says:

“Humans are biologically inclined to recognize patterns….Does correlation imply causation? It’s intuitive, but it’s not always true. …Correlation, as a concept, means strictly that two things vary together…(but) Correlations don’t always make sense.

Provided enough data, it is possible to find things that correlate even when they shouldn’t. The method is often called “data dredging.” Data dredging is a technique used to find something that correlates with one variable by comparing it to hundreds of other variables. Normally scientists first hypothesize about a connection between two variables before they analyze data to determine the extent to which that connection exists.

Instead of testing individual hypotheses, a computer program can data dredge by simply comparing every dataset to every other dataset. Technology and data collection in the twenty-first century makes this significantly easier….This is the world of big data and big correlations….

Despite the humor, this book has a serious side. Graphs can lie, and not all correlations are indicative of an underlying causal connection. Data dredging is part of why it is possible to find so many spurious relationships….Correlations are an important part of scientific analysis, but they can be misleading if used incorrectly.”

Vigen, Tyler. Spurious Correlations. Hachette Books. Kindle Edition. May 2015.

Why is it that people are so easily allured/seduced into assuming that correlation implies causation? Vigen states: “Humans are biologically inclined to recognize patterns”. This reminds me of a blogpost in “Science or not” by Graham Coghill called “Confusing correlation with causation: rooster syndrome”.

http://scienceornot.net/2012/07/05/confusing-correlation-with-causation-rooster-syndrome

He quotes: The rooster crows and the sun rises

And then he says: “This is the natural human tendency to assume that, if two events or phenomena consistently occur at about the same time, then one is the cause of the other. Hence “rooster syndrome”, from the rooster who believed that his crowing caused the sun to rise….

We have an evolved tendency to believe in false positives – when event B follows soon after event A, we assume A was the cause of B, even if this is untrue. In evolution, such beliefs are harmless, whereas the belief that A is not the cause of B when it actually is (false negative) can be fatal. Michael Shermer explains: “For example, believing that the rustle in the grass is a dangerous predator when it is only the wind does not cost much, but believing that a dangerous predator is the wind may cost an animal its life.”

Michael Shermer wrote an article in Scientific American with the title “Paternicity: Finding Meaningful Patterns in Meaningless Noise”.

http://www.scientificamerican.com/article/patternicity-finding-meaningful-patterns/

He says:  “Why do people see faces in nature, interpret window stains as human figures, hear voices in random sounds generated by electronic devices or find conspiracies in the daily news? A proximate cause is the priming effect, in which our brain and senses are prepared to interpret stimuli according to an expected model.

Is there a deeper ultimate cause for why people believe such weird things? There is. I call it “patternicity,” or the tendency to find meaningful patterns in meaningless noise. Traditionally, scientists have treated patternicity as an error in cognition. A type I error, or a false positive, is believing something is real when it is not (finding a nonexistent pattern). A type II error, or a false negative, is not believing something is real when it is (not recognizing a real pattern—call it “apat­ternicity”).

In my 2000 book How We Believe (Times Books), I argue that our brains are belief engines: evolved pattern-recognition machines that connect the dots and create meaning out of the patterns that we think we see in nature. Sometimes A really is connected to B; sometimes it is not. When it is, we have learned something valuable about the environment from which we can make predictions that aid in survival and reproduction.”

When data is collected in a non-random, uncontrolled, survey, it is very  hazardous to base decisions and actions on the assumption that correlation implies causation. It is impossible know which correlations correspond to causation with a high probability and which are spurious. And it is impossible to estimate the risks associated with decisions and actions based on the assumption.

Correlations between variables calculated from data collected in a non-random, uncontrolled survey can not be used for anything but to state hypotheses that can be tested in statistically sound research.

# The Fundamental Importance of Causation

Causation is extremely important. It is the most fundamental relation or connection in the universe. Without it there would be no science or technology. Our thoughts would not be connected with our actions and they would not be connected with consequences. There would be no moral responsibility and no legal system. Causation is the basis of prediction and explanation. Any intervention we make in the world around us is premised on there being causal connections that are to at least to some degree ch a predictable. Without it we would not be able to predict or explain anything. We would not be able to make decisions and not be able to act on these decisions.  There would be no natural laws. There would be total chaos. Such a world is illustrated by the following picture of random points i a two-dimensional space. There is no correlation, no causal claims can be made, no prediction or explanation possible, no rational decisions can be made and no rational actions can be taken.

The picture is generated by Poisson process using a Monte Carlo random number generator. I took it from the blog  by Peter Coles “In the Dark”.

https://telescoper.wordpress.com/2009/04/04/points-and-poisson-davril/

We can therefore not do without causation and It is very important to be able to identify and establish causes.

Anyone who makes a causal claim must state the premises on which he/she bases the claim. He/she must have a theory of causation

What is it for one phenomenon/event to cause another phenomenon/event?

Everybody thinks that he/she intuitively knows what causation means or is and how to make valid causal claims. However, philosophers and scientists have proposed a large number of theories and there is as yet no consensus about a single theory.

Because of the uncertainty about causation I recently decided to read deeply about causation. I want to be able to identify causal relations that have a very  high probability of being true, true positives and negatives, false positives and negatives. I want to avoid being seduced by spurious correlations.

This has proved to be very difficult and time consuming. I have been reading about 15 books and a large amount of other material but I have to admit that my knowledge and understanding is still marginal. This is understandable considering that philosophers and scientists have not been able to reach consensus about causation.

I shall continue to publish posts and pages about causation to this website in the hope that this will increase my knowledge and understanding about causation and perhaps also that of others.