Thursday, 24 September 2020

Day 192 of self-isolation - Bayes theorem

Bayes theorem

I did my PhD on this, you can see it on my web site. It is applicable in many situations, and it often leads to results that are surprising.

Let's look at Covid testing. The commonly used test in the UK gives about 0.1% of false positives (specificity, says you are infected when you are not) and 5% of false negatives (sensitivity).


The tests being used in the UK are (I think) Abbott and Roche. So the sensitivity is 95%, specificity is 0.2%. I've heard reports that sensitivity is 80% and specificity is "under 1%" but I can't see where those numbers cam from. I've given the source of my numbers.

Now, imagine that 100,000 people are tested per day (that's the government target). We're seeing about 6000 positives per day, so there's about 94000 negatives. But if you test 100,000 uninfected people, you would get 200 false positives (0.2% specificity). 200 false positives out of 6000, means 3% of reported infections might be false positives. That's a very small number, abd contradicts some numbers that are circulating amongst innumerate journalists who claim that 91% of the positives are false positives. The mistake that such people make, is that they don't know about Bayes Theorem, and so they don't factor in the prior probability.

So here's the wrong calculation:

1000 people turn up to be tested, only one has Covid, specificity is 1%, so 10 test positive. so nine out of ten are false positives, and so you deduce that 90% of the positives are false, and over-estimate of cases that is tenfold! Awful!

And here's the right calculation. The wrong assumption is that only one of them has Covid. But why did they go for testing? Because they have symptoms. So the chances are, more than one of those thousand had Covid. A lot more!

We call this number, the guess about how many had Covid before we do the test, the "prior probability". So here's Bayes Theorem. The probability of the event A, given that B is true, is equal to the probability of the event B, given that A is true, multiplied by the probability of A and divided by the probability of B.


So, let's suppose that of the 1000 turning up for testing, 100 have Covid. The 95% sensitivity means that 95 of them are confirmed, and the 1% specificity means that 10 more are flagged even though they are clean. So there are 105 flagged as infected when the true number is 100, which means that the over-estimate of cases is 5%.

The truth is somewhere in between. It's more like of those 1000 turning up for testing, 25 are infected. 95% sensitivity means that 24 are confirmed, and 1% specificity means that 10 more are flagged. So the number flagged is 34 where the true number is 25, an over estimate of cases by 36%. If we apply that to the current number of 6178, then that would mean that the true number is 4543. So, there might be some overstatement of cases, but it isn't a tenfold over-estimate.

But the specificity isn't 1%, it is more like 0.1% (see source above). so in the case above, you get 24 cases confirmed and 1 false positive. and the number flagged is 25, which is (hurrah) the correct number.

Any statistician knows about Bayes Theorem, but I doubt if many journalists do. And doctors? Maybe, but I doubt if the medical training of most doctors extends to sophisticated mathematical analysis.

No comments:

Post a comment