Data does not have an agenda, it does not lie, but it rarely shows the whole story.

Think of the data that would be gathered and analysed after the announcement of a chunk of native forest was opened up for logging. The botanists would have one set of data and analysis of the impact, the accountants another, the entomologists another, those concerned with native animal habitat another, and so on. None are wrong, but all are incomplete without the input of  the others.

Corporate use of data does have an agenda, performance, and unfortunately often personal advancement. Similarly, data delivered as fact by a politician has an agenda: getting elected.

The data does not have an agenda, those who use it often do.

Bias in data can be conscious, as well as unconscious. Someone has to decide what data is collected,  what hypotheses to test, and how it is to be used. All can be shaped to meet a predetermined outcome.

When making a major decision we all look for the data that will give us confidence in our choice.

However, we are all also familiar with the nagging feeling that the data we are looking at is nothing short of bullshit.

So how can you tell?

Here are 11 simple tests to apply.

  • Where did the data come from? Organisations, geographies, people, all make a difference.
  • Was the collection method designed by someone with a vested interest in the outcome?
  • What are the gaps in the data? These can easily be created by the manner in which questions are asked, or often, not asked.
  • What assumptions were made in assembling and analysing the data? No data survives the filtering imposed by the assumptions in the assembly and analysis processes.
  • What statistical measures have been applied? The number of initial data points, upper and lower control limits, confidence levels, all the statistical tools available, but too often dismissed by non statisticians and those running an agenda.
  • Be wary of creative articulation. Percentages are regularly thrown about as ‘proof’ of something. A 50% increase in accidents in your suburb in the past year may mean there were 3 compared to 2 last year. Similarly, averages are often misleading. We expect the mean to be close to the median (middle point in a range) but often it is not.
  • Who gains or loses from the outcome? Just look at the current political ‘debate’ in this country for ample evidence of this. There are no laws about truth in advertising for political ads, therefore the numbers quoted are heavily edited, or it would seem, often just made up.
  • Is the data describing just correlation or is it truly causation. This is often used to make a case. For example this compelling case put forward by the economist a while ago ‘proving’ that intelligence increased with consumption of ice cream.
  • What are the alternative explanations of the conclusions articulated, and what are we not being told?.
  • Is the data giving you the answer to the question being asked, or to some other question? And, how well is the question reflected in the answer?
  • Has anyone with an established perspective opposite to the outcome of the data had a critical look at it? This is often a good way of finding the holes in the collection and analysis.

While statistics can be made to lie, they will also deliver transparency when you understand the basic measures. People will often tell you what they think you want or need to hear, and when it is backed by data, it becomes more credible, particularly if it confirms an already established point of view.

Finally, if it seems too good to be true, there is a fair chance that it is, our instincts are usually pretty good, so follow them until proved otherwise. 

I am  by no means a data nerd, but I do believe that good data can make our collective lives better by improving decision making, and removing just a little of the bullshit sprayed at us so regularly and methodically by everyone with a cause.

Data does not lie, people using data can, and do.

The header cartoon is from David Somerville’s Random Blather blog, an extension of Hugh McLeod’s original.