Cricket header

Estimating the spread of the Coronavirus

1700 plus or minus....what?

[Since I wrote this blog, the Coronavirus story has moved on a long way.  The wildly varying estimates of how many cases there have been suggest that my reaction to those first reports was right.] 

Today's leading headline on the BBC was about a new virus, the Wuhan Coronavirus, that is emerging in China. Two people are reported to have died from it.  The Chinese were claiming there had been 41 cases so far,  but we were then told that experts in the UK believe the number is "closer to 1700".

When you read that estimate of 1700, what does it make you think the 'real' number is?*  And on what do you imagine that estimate was based?

When I hear an estimate like 1700, especially one that has come from experts, I know that of course it isn't exactly the right figure (it's an estimate after all), but I assume that 1700 is probably quite close to the true figure. After all, they have quoted the number to two significant figures, so they were confident in saying 1700 rather than 'about 2,000'.  So it sounds like they are pretty certain that the real number of cases is somewhere in the range 1500 to 1900.

Fortunately in this case, I was able to look up where the number came from.  The analysis by Imperial College is here: Estimating Coronavirus Cases in Wuhan

The summary begins with an even bolder figure:

"We estimate a total of 1,723 cases of 2019-nCoV in Wuhan City".  Not 1722, not 1724 but 1723.

However the important numbers are the ones that follow it:  CL 95% 427 – 4471.

In plain English, what they are claiming is that - ACCORDING TO THEIR ASSUMPTIONS - they were 95% confident that the number of cases was somewhere in the range 427 to 4471 when they wrote the report.  Already that's a huge range, with a factor of ten between the smallest and largest estimate.  But who's to say that their assumptions are right?  Sensibly, they use what's called sensitivity analysis to see what happens to the numbers if their assumptions are wrong.  What if there have been more international travellers than they have assumed? What if the incubation period is shorter?  These tweaks lead to a bigger range.  Depending on the scenario, they accept the figure could easily be anywhere between 190 and 5590.

How does the bald figure of "1700" sound now?  

It's good that Imperial have published their calculations and assumptions, but unfortunately, the number 1700 has now got the stamp of media endorsement, and before long it will become definitive.  The huge uncertainty in the figure will become a footnote that will be largely ignored, so what we are left with is a number quoted to two significant figures that creates the impression that it is 'accurate'.  Technically the experts are right to say that the true figure is probably closer to 1700 than 42.  But then Pi is closer to 30 than it is to 1000 - but that doesn't tell us much about Pi.

Does this matter? Yes it does.  Because we make decisions based on our confidence in a number.  Tourists will be taking account of 1700 when thinking about whether to go on holiday in Wuhan.  Health practioners across the world will be making decisions on whether to spend time planning for the virus depending on how big a threat they think it is.

It reminds me of the forecasts that emerged in the late 1990s when the first victims of the variant CJD began to die horribly.  Based on the number of deaths so far, scientists began to forecast the number of victims over the next 20 years.  They reckoned it would be somewhere between 150 and 350,000.  And it turns out they were right - because 20 years on, the true figure was just over 200 cases.  That 350,000 figure now looks a bit silly.  But the silliest part of it for me is the second signficant figure. If they'd said 'It is possible that over a quarter of a million could contract vCJD' that would have given a more credible impression of the uncertainty of the number.