The old adage that you can find data to support any proposition, almost no matter how wild, has never been as prevalent as it is today.
We have the sight of politicians on the one hand telling us the science is wrong as it reflects the looming catastrophe of climate change, while at the same time lauding science in the way the world has responded to the covid pandemic with new vaccines in record time.
The contradiction is extreme, however, there is always data to ‘prove’ whatever point is required.
Following are some of the common ways data is manipulated to mislead, misinform, and bamboozle the unwary.
- Confusing correlation with causation. This is very common, and I have written about it on several occasions. Just because the graphs of ice cream sales and shark attacks mirror each other, does not mean one caused the other.
- The Cobra effect. This refers to the unintentional negative consequences that arise from an incentive designed to deliver a benefit. The name comes from an effort by the British Raj to reduce the number of cobras, and associated deaths that occurred in Delhi, by offering a bounty on each dead cobra. Entrepreneurial Indians started to breed them for the bounty. The identical situation applied when the French wanted to reduce the rat population of the French Indochina. They stuck a bounty on rats’ tails, which resulted in enterprising Vietnamese catching the rats for their tails and then releasing them to breed further.
- Cherry Picking. Finding results, no matter how obscure, that support your position, and excluding any data that might point out the error. This is the favourite political ploy, having a great run currently.
- Sampling bias. Relying on data that is drawn from an unrepresentative sample from which to draw conclusions. It is often challenging to select a sample that delivers reliable conclusions, and often much too easy to select one which delivers a predetermined outcome. Again, a favoured political strategy.
- Misunderstanding probability. Often called the gamblers fallacy, this leads you to conclude that after a run of five heads in a two-up game, the next throw must be tails. Each throw is a discreet 50/50 probability, no matter what the previous throws have been. Poker machine venues rely on the players increasing belief that the ‘next one’ will be the ‘jackpot’ after a run a ‘bad ones’ for their profits.
- The Hawthorne effect. The name comes from a series of experiments in the 1920’s in the Hawthorne Works factory in the US producing electrical relays. Lighting levels were altered minimally to observe the impact on worker productivity, and concluded that they improved when lighting was increased, but later dropped. The effect of the lighting was later disproved, when psychologists recognised that people’s behaviour changes when they are, or believe they are, being observed. This can be a nasty trap for the inexperienced researcher conducting qualitative research.
- Gerrymandering. Normally this refers to the alteration of geographic boundaries, usually in the context of electoral boundaries. It can equally be used to describe the boundaries set around which source data can be included in any sample. ‘Fitting’ the data to deliver the desired outcome. The term originated from the manipulation of electoral boundaries in Boston in 1812 when the then Governor Elbridge Gerry signed a bill that created a highly partisan district in Boston that resembled the mythical salamander. The national party held government in QLD for 32 years until 1989 as a result of a massive gerrymander in their favour, perhaps better remembered as a ‘Bjelkemander’
- Publication bias. Interesting or somehow sensational research is more likely to be published and shared than more mundane studies. In this day of social media, this becomes compounded by the ‘echo chamber’ of social platforms.
- Simpson’s paradox. This describes the situation where a trend evident in several data sets is eliminated or reversed when the data is combined. An example might be the current debate about university admissions favouring males over females. If you take subsets of the data for different faculties, this may be true, but combine the faculties, and the numbers will be virtually even, perhaps even favouring females. This was demonstrated in a study of admissions to UC Berkely in 1973 and is a regular feature of misleading political commentary.
- McNamara Fallacy. This comes about when reliance is placed on data only in extraordinarily complex situations, ignoring the ‘big picture’, and assuming rationality will prevail. The name comes from reference to Robert McNamara, US Secretary of Defence under Presidents Kennedy and Johnson who used data to unintentionally lead the US into the disaster that was Vietnam, later acknowledging his mistake.
Using data to is an essential ingredient in making your case, as they convey rationality and truth. When listening to a case being made to you, be very careful as numbers have the uncanny ability to lie. To protect yourself, ask at least some of these eleven questions.
Header illustration credit: Smithsonian. The drawing is of the electoral district created by Massachusetts Governor Elbridge Gerry in 1812 to ‘steal’ an election.