Category Archives: Decision Analysis

Correlation does not imply causation

When I began to use the new generation of data analysis and visualization software like Tableau I thought that I would first use them to address some of the most important problems of humanity like Resource Scarcity, Inequality, Poverty, Human Migration, Refugees, …

I have found large amounts of data relevant to these problems published to the Internet by various organizations and institutions, like the United Nations, The World Bank, The World Health Organization, …The data are usually in the form of data tables with countries, regions, and locations as rows, time periods as rows or columns, and variables as columns.

The data have been collected in surveys. The completeness of the data and their reliablity is uncertain and variable.

The presentations of the data in in the worksheets and dashboards of Tableau workbooks are very fine and I have no doubt that such presentations can increase the viewers knowledge and understanding of the problems. But in order to solve a problem it is necessary to identify, eliminate or minimize its cause or causes.

The presentations can be seductive. Viewers may be tempted to identify causes by calculating correlations between the variables in the data and assuming that correlations imply causation.

Statisticians know that correlation does not imply causation. What does this mean? Correlation is a measure of how closely two things are related. You may think of it as a number describing the relative change in one thing when there is a change in the other, with 1 being a strong positive relationship between two sets of numbers, -1 being a strong negative relationship and 0 being no relationship whatsoever. “Correlation does not imply causation” means that just because two things correlate one does not necessarily cause the other. Although this is an important fact most people do not sufficiently take this into account. Their preconceptions tempt them to leap from correlation to causation without sufficient evidence.

This can result in absurd and ridiculous causal claims. Tyler Vigen has recently published the second edition of his book “Spurious Correlations” (May 8, 2015).

http://www.amazon.com/gp/product/0316339431/ref=as_li_tl?ie=UTF8&camp=211189&creative=373489&creativeASIN=0316339431&link_code=as3&tag=tylervicom-20&linkId=UO6I3ENRRQUF255J

He has designed software that scours enormous data sets to find spurious statistical correlations. In the Introduction to the book he says:

“Humans are biologically inclined to recognize patterns….Does correlation imply causation? It’s intuitive, but it’s not always true. …Correlation, as a concept, means strictly that two things vary together…(but) Correlations don’t always make sense.

Provided enough data, it is possible to find things that correlate even when they shouldn’t. The method is often called “data dredging.” Data dredging is a technique used to find something that correlates with one variable by comparing it to hundreds of other variables. Normally scientists first hypothesize about a connection between two variables before they analyze data to determine the extent to which that connection exists.

Instead of testing individual hypotheses, a computer program can data dredge by simply comparing every dataset to every other dataset. Technology and data collection in the twenty-first century makes this significantly easier….This is the world of big data and big correlations….

Despite the humor, this book has a serious side. Graphs can lie, and not all correlations are indicative of an underlying causal connection. Data dredging is part of why it is possible to find so many spurious relationships….Correlations are an important part of scientific analysis, but they can be misleading if used incorrectly.”

Vigen, Tyler. Spurious Correlations. Hachette Books. Kindle Edition. May 2015.

Why is it that people are so easily allured/seduced into assuming that correlation implies causation? Vigen states: “Humans are biologically inclined to recognize patterns”. This reminds me of a blogpost in “Science or not” by Graham Coghill called “Confusing correlation with causation: rooster syndrome”.

http://scienceornot.net/2012/07/05/confusing-correlation-with-causation-rooster-syndrome

He quotes: The rooster crows and the sun rises

And then he says: “This is the natural human tendency to assume that, if two events or phenomena consistently occur at about the same time, then one is the cause of the other. Hence “rooster syndrome”, from the rooster who believed that his crowing caused the sun to rise….

We have an evolved tendency to believe in false positives – when event B follows soon after event A, we assume A was the cause of B, even if this is untrue. In evolution, such beliefs are harmless, whereas the belief that A is not the cause of B when it actually is (false negative) can be fatal. Michael Shermer explains: “For example, believing that the rustle in the grass is a dangerous predator when it is only the wind does not cost much, but believing that a dangerous predator is the wind may cost an animal its life.”

Michael Shermer wrote an article in Scientific American with the title “Paternicity: Finding Meaningful Patterns in Meaningless Noise”.

http://www.scientificamerican.com/article/patternicity-finding-meaningful-patterns/

 

He says:  “Why do people see faces in nature, interpret window stains as human figures, hear voices in random sounds generated by electronic devices or find conspiracies in the daily news? A proximate cause is the priming effect, in which our brain and senses are prepared to interpret stimuli according to an expected model.

Is there a deeper ultimate cause for why people believe such weird things? There is. I call it “patternicity,” or the tendency to find meaningful patterns in meaningless noise. Traditionally, scientists have treated patternicity as an error in cognition. A type I error, or a false positive, is believing something is real when it is not (finding a nonexistent pattern). A type II error, or a false negative, is not believing something is real when it is (not recognizing a real pattern—call it “apat­ternicity”).

In my 2000 book How We Believe (Times Books), I argue that our brains are belief engines: evolved pattern-recognition machines that connect the dots and create meaning out of the patterns that we think we see in nature. Sometimes A really is connected to B; sometimes it is not. When it is, we have learned something valuable about the environment from which we can make predictions that aid in survival and reproduction.”

When data is collected in a non-random, uncontrolled, survey, it is very  hazardous to base decisions and actions on the assumption that correlation implies causation. It is impossible know which correlations correspond to causation with a high probability and which are spurious. And it is impossible to estimate the risks associated with decisions and actions based on the assumption.

Correlations between variables calculated from data collected in a non-random, uncontrolled survey can not be used for anything but to state hypotheses that can be tested in statistically sound research.

Decision Making Methods

A large number of decision making methods have been developed. A few of them are listed below:

Decision Trees
Influence Diagrams
Multicriteria Decision Analysis
Analytic Hierarchy Process
Analytic Network Process
Hierarchical Influence Diagrams
PAPRIKA Method: “potentially all pairwise rankings of possible alternatives”

These methods are implemented by various software packages, for example:

TreeAge Pro
AgenaRisk
1000minds
Analytica Free
Priest

 Decision Making Methods – Page

Decision Analysis Software

Decision making software is a tool intended to support the decision making process but not to replace it. The software frees decision makers from the technical details of the decision-making method employed and makes it possible for them to focus on fundamental value judgments.

A large number of software packages are available. Their quality and price is extremely variable. Some of the packages are exorbitantly expensive. Some are less expensive and even free but nevertheless of high quality. I have selected the following packages for my own use:

TreeAge Pro
AgenaRisk
1000minds
Analytica Free
Priest

These packages may be shortly characterized as follows:

TreeAge Pro is a visual modeling tool for building and analyzing decision trees, influence diagrams and Markov models. It employs Bayes analysis and multicriteria decision making.

AgenaRisk uses the latest developments from the field of artificial intelligence and visualisation to solve complex, risky problems. AgenaRisk enables decision-makers to measure and compare different risks in a way that is repeatable and auditable. The AgenaRisk solution includes predictive analytics and scales up to organisational-level risk monitoring and assessment. It is ideal for risk scenario planning.

1000minds is an online decision-making software for multi-criteria decision making. The software implements the “potentially all pairwise rankings of possible alternatives (PAPRIKA) method.

Analytica is a visual decision-making software. It combines hierarchical influence diagrams for visual creation and view of models, arrays folr working with multidimensional data, Monte Carlo simulation, for analyzing risk and uncertainty, and optimization, including linear and nonlinear programming.

PriEsT is an open-sources decision making software that implements the analytic hierarchy process method.

Decision Analysis Software – Page

Data and Decision Analytic Process

The data and decision analytic process is a path leading from the larva of data to the butterfly of knowledge, understanding, and insight.

Before starting to work on data analytic and associated decision analytic projects it is necessary, in order to ensure the quality of the results, to

  1. define an orderly data analytic and decision analytic process
  2. select methods for executing the process
  3. select software packages for implementing the methods

In order to ensure the reliability of the answers/solutions and the quality of the decisions made and actions taken, it is necessary to adhere to the analytic process in an orderly manner and apply the methods and the software packages in a  competent manner

Before starting an analytic process it is necessary to state the question/problem under consideration and ask the following preliminary questions:

  1. Is the answer/solution considered known?
  2. Is the the answer/solution based on sufficiently recent/reliable data?
  3. Was the analysis performed in a competent/reliable manner?
  4. Is the results of the analysis presented/visualized in such a way that it sufficiently increases the understanding and insight of the target group ?
  5. Do the results of the analysis, their presentation/visualization, and the resulting understanding and insight form a sufficently firm basis for decision making and action?

If any of the answers are no there may be a reason to go ahead with the analytic and decision analytic process. If all the the answers are yes it is unnecessary to go ahead with the process unless you are confident that you can improve the results materially or introduce your particular results to a new or wider audience. But beware of hubris.

The main stages of a combined data analytic and decision analytic process

  1. State an important question/problem
  2. Data analysis
    1. Select data relevant to answering the questions or solving the problems
    2. Prepare the data for analysis. Employ visualization during preparation
    3. Analyze the data – Increase knowledge about the past, present, and future state of the system generating the data – Increase knowledge about individual variables and the relationship between variables. Employ visualization extensively during analysis
      1. Descriptive data analysis
      2. Exploratory data analysis
      3. Confirmatory data analysis
      4. Predictive data analysis
    4. Present/visualize the results of the analysis
    5. Evaluate the results of the analysis – Have the original questions been answered?
  3. Decision analysis
    1. Make decisions based on the results of the analysis
    2. Implement decisions – Act
    3. Present/visualize the results of the actions
    4. Evaluate the results of the actions – Have the original problems originally posed been solved?
  4. Reiterate the process or its individual stages as necessary

Data and Decision Analytic Process – Page

Decision Analysis

Decision Analysis is a systematic, quantitative and visual approach to addressing and evaluating important choices confronted by decision makers. Decision analysis utilizes a variety of tools to evaluate all relevant information to aid in the decision making process.

From <http://www.investopedia.com/terms/d/decision-analysis.asp>

After all of the alternatives have been analyzed and a final decision has been reached, there are steps that should be taken during the implementation process for that decision. Three essential actions to implementing a decision include creating an implementation plan, informing stakeholders, and finally, adjusting the decision to make compromises as necessary.

From<https://www.boundless.com/management/decision-making/decision-making-process/implement-the-course/>

Decision Analysis – Page