Stereotyping, and Rare but Important Events

Phil Arena has an interesting but problematic piece up at Duck of Minerva, entitled “Bayes, Stereotyping, and Rare Events.” The substantive topic of the post is a recent survey of Muslims that I’m not too interested in. But Phil uses statistics to mask a deeply flawed and irrelevant conclusion:

Put simply, the probability that you’d be mistaken to assume that someone who belongs to group Y is likely to commit or have committed act X simply because most such acts are committed by members of group Y grows exponentially higher as X becomes rarer. The reason you should not assume that a person is a terrorist just because they’re Muslim, then, is not just that this is politically incorrect and likely to offend delicate liberal sensibilities. It’s that it’s almost certainly incorrect, full stop.

The first and last sentences in that paragraph have almost nothing to do with each other. Phil’s conclusion is irrelevant, and the “full stop” leaves the most important part of the conclusion unsaid.

And Phil’s not alone in such a mistake. Take for example an recent statement on the NPR program “Tell Me More” by Fernando Vila. Fernando is responding to a statement that a disproportionate fraction of violent crimes in New York City are committed by African Americans:

VILA: Well, I mean, the notion of paranoia is a good one and Mario’s statistics actually sort of feed into that – into this culture of paranoia. I mean, the vast majority of black people are not committing crimes.

VILA: You know, it’s like to say, I don’t know – the vast majority of hosts on NPR are white males. That doesn’t mean that every time I encounter a white male on the street I assume he’s a host of NPR. You know, it’s just a backwards way of looking at it

Phil and Fernando make exactly the same mistake: false assuming the cost of a “false positive” (accidentally marking someone as suspicious) is the same as the cost of a “false negative” (accidentally marking someone as not suspicious). But the truth is all errors are not equal.

The cost of a mistake is a function of the severity of the mistake.

Is the cost to society of 1 false positive (falsely placing an individual under suspicion of terrorism) the same as the cost to society of 1 false negative (falsely removing suspicion from an actual terrorist)? No, of course not, but Phil’s post is based that on fallacy. Otherwise his conclusion makes no sense.

There is a serious question as to where we should become indifferent to the trade-off — 10:1? 100:1? 1:1000000? — but it is certainly not 1:1.

Likewise, Fernando’s statement on NPR is irrelevant. While the consequence of guessing an individual’s employment status at NPR might be 1:1 (few would care either way), the costs of falsely assuming someone would attack you is far less than the cost of falsely assuming an individual will not attack you. Again, there is a question of trade-offs — 1000:1, 10000:1, 1000000:1? — but the cost of all errors are not identical.

Now, obviously Phil and Fernando had different motives here. Phil’s obviously trying to popularize some basic statistics, while Fernando is doubtless ignorant of basic statistics. But in both cases an unwary audience will be led astray into thinking all errors are equally important.