Statistics and the human cost of the war in Iraq
Many commenters on the Lancet study (pdf) boggle at the numbers, point at the uncertainty, express disbelief, and note that they’re not statisticians. Well, I’m here to help.
Although perhaps not very much. I’m not a statistician either. I scraped the bottom of the barrel as a student taking my one required stat class. It was only because Dick Lewontin was a brilliant teacher and exceedingly merciful that I passed at all. But in some ways that may make it easier for me to explain. I know what we all go through when statistics get thrown at us.
I won’t be discussing specifics of the methodology or how they collected data. (For what my opinion is worth, their methodology is excellent.) Billmon, Zeyad, and the Lancet article itself go into that in exhaustive detail. (Update, Oct 19. Another English- rather than statistics-based discussion by Greg Mitchell. Yet one more: Riverbend gives her usual excellent personal take on the numbers.) Iraq Body Count has a much lower number (about 43,000 at the low end of the estimate) because that is a tally purely of deaths reported in various media. Anyone who thinks that the media are cataloguing every single death in Iraq is living in a dreamworld. Of course IBC’s estimate is vastly lower.
I’d like to (try to) explain in a nutshell what the overall numbers in the Lancet article mean.
The main thing that seems to have people’s knickers in a twist is the level of uncertainty surrounding the estimates of the true number of excess deaths. (It’s worth pointing out that the uncertainty would be much lower if the US had lived up to its obligations as an occupier and kept as good a count as it could of deaths in the country.)
There are two different kinds of uncertainty: the uncertainty of not knowing whether your numbers are right because of the difficulty of collecting the data, and the statistical measure of uncertainty. The broad range of estimates, 392979 – 942636, in the Lancet article is due to the difficulty of collecting data. Since getting the data is difficult, the distribution of estimates of the real number of deaths will look like the blue line below. Note that the line does NOT represent numbers of deaths. It represents estimates of what the actual real number is.
(Graphs modified from Wikipedia, showing generic normal distributions to illustrate the concepts discussed. These are not from the Lancet.)
The important thing to remember is that the statistics tell you how much chance you have of guessing wrong. The true number has a 68% likelhood of being somewhere in the blue zone in the lower graph above. It has a 95% likelihood of being somewhere within the blue plus beige zones. In the top graph, the 95% zone lies between the dashed lines: as discussed below, that’s a narrow range for the red line, broad for the blue one.
With good data, the chance that your estimate will be far from the true number (i.e. “0”) is low, so the curve is steep and pointy. If, for instance, the true number of excess deaths were 655,000, and the necessary records to count the number of deaths were easily available, the likelihood that the real number of deaths was, say, 600,000 would be vanishingly small. Ninety five percent of the estimates might fall between, for instance, plus or minus 10,000 deaths, as depicted by the dashed lines in the top graph.
With hard-to-collect data, the chance of estimating wrong is much higher. The likelihood that the real number was 600,000 is not vanishingly small. It’s quite large, and 600,000 may, in fact, be the real number. So may 700,000. Both are equally likely. If one wants to stress that the number of excess deaths could be as low as 393,000 according to this study, one has to also stress that it could be as high as 943,000. The uncertainty of the estimate means higher numbers are as likely as lower ones.
What the range of numbers means is that there is high statistical certainty (at least 95% to be precise) that the real number of deaths falls within that range. The range encompasses the blue and the beige areas under the graph (and is represented by the hard-to-see dashed lines at the extreme right and left of the blue line in the top graph). That means there is a 95% probability that the true number of deaths falls somewhere between 392,979 and 942,636. There is a less than one in twenty chance that “only” 350,000 people have died due to the occupation, or that a million people have died. In other words, there is a great deal of statistical certainty that the range is correct. The midpoint of the range is the likeliest true number, but that is less certain.
Hundreds of thousands of people have died. That is not in dispute any more than any other scientific conclusion that rests on a 95% confidence level (i.e. all biological and medical science).
So, now that I’ve cleared that up, can we stop pooh-poohing the numbers and start being appropriately horrified that hundreds of thousands of people have died?
Technorati tags: Iraq, body count, Lancet, war, human cost