Reliability of Basic Exploratory Data Analysis – 1

Lately I have been wondering about the reliability of the results of the various types of exploratory data analysis for supporting decisions and actions. How confident can we be in the decisions and actions based on these results and how willing are we to take responsibility for the risks associated with them.

Confirmatory exploratory data analysis is characterized by:

Stating a hypothesis prior to data collection
Using probability sampling
Using descriptive statistics and visualizations
Using inferential statistics

Basic exploratory data analysis is characterized by:

Not stating a hypothesis before data collection
Not using probability sampling
Using descriptive statistics and visualizations
Not using inferential statistics

Confirmatory exploratory data analysis can give reliable descriptions and visualizations of the sample. It can be assumed to give reliable estimates of  statistics of the population from which the probability sample was taken, and to give reliable support to decisions and actions.

Basic exploratory data analysis can also give reliable descriptions and visualizations of the sample. On the other hand it can not be used to give reliable estimates of statistics of the population from which the non-probability sample was taken and  can not be used to give reliable support to decisions and actions.

Wikipedia has an entry about nonprobability sampling:

Nonprobability sampling

Sampling is the use of a subset of of the population to represent the whole population. Probability sampling or, random sampling, is a sampling technique in which the probability of getting any particular sample may be calculated. Nonprobability sampling does not meet this criterion and should be used with caution. Nonprobability sampling techniques cannot be used to infer from the sample to the general population. It is nearly impossible to describe quantitatively the relation between a nonprobability sample and the underlying population of interest

The website of the Human Rights Data Analysis Group has a fine article about nonprobability samples, which they call convenience samples.

Convenience Sampling

“Absent a probability-based selection procedure, it is nearly impossible to describe quantitatively the relationship between a convenience sample and the underlying population of interest.”

“Statistical inference is appropriate in three cases:

  1. Data are from a random samplet
  2. Data are a complete enumeration (e.g., a census)
  3. Multiple (possibly non-random) samples are analyzed using Multiple Systems Estimation (MSE)

Still, some use nonprobability sampling and apply inferential statistics to the the samples.

When I think about data analysts laboring on basic explorative data analysis applying inferential statistics to a nonprobability sample I am for some reason reminded of:

Baron von Munchhausen’s Journey to and from the Moon

For some reason – that is not entirely clear to me – this sequence of steps reminds me of the following story about how Baron von Munchhausen climbed up to and down from the moon and up from a deep hole:

“I recollected that Turkey-beans grow very quick, and run up to an astonishing height. I planted one immediately; it grew, and actually fastened itself to one of the moon’s horns. I had no more to do now but to climb up by it into the moon, where I safely arrived, and had a troublesome piece of business before I could find my silver hatchet, in a place where everything has the brightness of silver; at last, however, I found it in a heap of chaff and chopped straw. I was now for returning: but, alas! the heat of the sun had dried up my bean; it was totally useless for my descent: so I fell to work, and twisted me a rope of that chopped straw, as long and as well as I could make it. This I fastened to one of the moon’s horns, and slid down to the end of it. Here I held myself fast with the left hand, and with the hatchet in my right, I cut the long, now useless end of the upper part, which, when tied to the lower end, brought me a good deal lower: this repeated splicing and tying of the rope did not improve its quality, or bring me down to the Sultan’s farm. I was four or five miles from the earth at least when it broke; I fell to the ground with such amazing violence, that I found myself stunned, and in a hole nine fathoms deep at least, made by the weight of my body falling from so great a height: I recovered, but knew not how to get out again; however, I dug slopes or steps with my finger-nails [the Baron’s nails were then of forty years’ growth], and easily accomplished it.

Munchhausen and the Moon

Leave a Reply

Your email address will not be published. Required fields are marked *