Tuesday, August 7. 2012
For some reason this has been cropping up over all kinds of blogs I read and there's a huge amount of confusion about it. Although there are some rather awkward applications (you can try to make learning software by creating feedback loops about how strongly you believe a certain thing but it gets computationally heavy and lots of nasty numbers very quickly) or, thankfully more easy to show, you can use it to predict hard to measure things from easy to measure things.
Imagine the following: you have a machine that detects terrorists that you want to put out an airport. It's fast and unobtrusive and you know how "good" it is. In some different circumstances how many people does it catch and how many are actually terrorists?
"Good" for such tests is usually expressed in terms of either specificity and sensitivity or false negatives and false positives. These concepts are complementary pairs - specificity is the measure of how often a 'negative' result actually comes from a 'negative' subject - false negatives are the complement of that - how often do you do get a negative response when you should get a positive? Similarly sensitivity is how often a positive result comes from a positive subject and false positives are the complement - how often a positive test result is given when the subject is negative. Fortunately, certainly for many types of tests, these things are fairly easy to work out.
For the sake of an example, and some numbers, we're going to start with a small airport, 1000 innocent travellers and 10 terrorists. We'll make the "good" test 90% sensitive and 90% specific. So, 90% - 9 of our 10 terrorists get identified. Good you might say. However, only 90% (900) of our innocent travellers are correctly identified. We're faced with a room of 109 people, and 100 of them are innocent and probably not very happy.
Lets do some semi-scientific tinkering - one variable at a time. In a small airport you might get a cluster of 10 terrorists to 1000 passengers, but that's an amazingly high concentration. Heathrow in 2011 had over 64 million international passengers. That's about 175,000 per day - it's the busiest international airport in the world, so lets say our hypothetical busy airport handles 100,000 innocent passengers in a day and those same 10 terrorists. We're still at 90% sensitivity and specificity so we again catch 9 terrorists. However, we also catch 10,000 innocent people! That's really not acceptable at all. How do you weed out the 9 terrorists from 10,009 people?
Well the obvious way is to reduce the number of false positives. We'll try the busy airport with a system that's still 90% specific but is now 99% sensitive. We still catch 9 terrorists, but now we 'only' catch 1,000 innocent people. Better, but hardly great. If we get to 99.99% sensitive we catch 9 terrorists and only 10 innocent people. 10 innocents will still be very unhappy of course, but are more likely to accept the inconvenience to them was reasonable and possibly even justified than 10,000 innocents are. Of course it's easy to say "Oh, ramp up the sensitivity from 90% to 99.99%" - as a rule of thumb each extra 9 you add increases the cost by a factor of 10. Three more 9's, so your new kit is 1,000 times more expensive. Ouch. And with that, you've gone from 1:10,000 to 1:2 pretty much, when you look at your suspect pool.
Increasing specificity with an extra 9 also has a cost of a factor of 10 by rule of thumb. It actually has less effect though, in some respects - in all these examples you go from catching 9 of the terrorists to probably catching all 10. In the case of terrorists the extra cost to make sure almost none slip through is probably worth it. In some other cases, not so much. If you're doing home pregnancy kits for example, 10% false negatives is probably OK - there are plenty of other signs that will build up over time and you can keep the costs down and suggest they do 2 or 3 tests if they're not sure - but reducing the false positives is probably worth it to some extent.
Although looking at the formula for Bayes' Theorem is likely to make you cry (unless you like symbolic logic and algebra and probability) it basically lets you plug the numbers you can measure in and get the useful numbers you want out.
With that most expensive of tests described above, as you interview each suspect there's basically a coin-flip for if they're a terrorist or a falsely identified innocent person.
And thus the demand for second tests, confirmation of a suspicion and so on. The maths gets even more complex, but the more you can cross-confirm with different tests the more positive you can be that something is going on. But start adding 1, 2, 3 or more tests that all independently suggest this and you're getting more and more certain that you're right.
The good news is, for most of us, this stays as expert knowledge. We don't have to know how to do it. But it is still worth having a general idea about this stuff. It's why your doctor will send you for extra tests. And when your government tells you 'it's for your protection' it gives you a way to work out just how effectively it protects you, and gives you some insight into where they might spend extra money to make a real improvement.
Display comments as (Linear | Threaded)
The author does not allow comments to this entry
Syndicate This Blog
Last entry: 2013-05-21 09:12
725 entries written
238 comments have been made