# Variation, Within and Between

Occasionally, you will encounter someone who says this:

Variation between human races is greater than variation within human races

If you do, you know you’ve encountered someone who has been indirectly exposed to the work of Richard Lewontin.

There are two forms of “Lewontin’s fallacy.” One is the original claim that Lewontin made. It is demonstrably untrue, which is obvious once examined with graduate-level statistical knowledge. A later, weaker version is simply nonsensical. I’ll address these in order.

Lewontin’s Original Fallacy

In 1972, Lewontin published an article called The apportionment of human diversity, using blood group proteins. The work is pretty typical for its time, except it extremely political correct connotations, and so eventually took on a life of its own. Rather than discuss the original article, which has been thoroughly debunked, bizarrely focuses on blood proteins anyway, here’s an analogy. (I’m too tired to do matrix algebra now.)

Say someone comes to you, and says this:

“The racial groups that map to what we consider ‘East Asian’ or ‘Caucasian’ do not exist. There is no attribute of either race you can find, in which the majority of variation is between races, rather than within races. Hair, skin tone, skeletal shape, and so on all vary within both populations, so that means there is only one population.

In other words, the groups “East Asian” and “Caucasian” are entirely social groups. It is impossible to write a machine learning system to tell an East Asian apart from a European, if you don’t include purely social constructs like name, clothing style, and so on.

The obvious refutation (which mathematically requires matrix algebra) is to ask why in the world you would use only one dimension of variation (like height, or skin tone) to classify individuals as part of multiple populations.

You can just use multiple indicators, together. That way if there has been a murder, say, and the corpse has been stripped of clothing and identification and has been dumped, you can use multiple indicators together to determine the race of the victim.

If there is DNA evidence, you can do the same.

Indeed, you can do the same with “races” such as “German” and “French”!

If for some reason you’re transported back to the 1970s, and all you have is blood proteins, you can do the same.

The solution to Lewontin’s fallacy is to use multiple indicators together, and not just one.

These days, it seems crazy to suggest it would be impossible to tell the race of an individual from DNA. There’s even a popular PBS show about the concept! But in the 1970s, some people really were that ignorant.

The Remnant that Remains

There’s no reason to take Lewontin’s original fallacy seriously, but sometimes you’ll hear a variation of it

Variation in intelligence between human races is greater than the mean difference of intelligence of the races

This is like saying moisture is taller than speed. It makes no sense.

In some areas of life, differences in variation between groups is the fact that matters most. For instance, on many measures (say IQ, or time orientation) males have greater variation than females, while both tend to have the same average. From this you would expect you would see many more male violent criminals than female violent criminals, and also more male CEOs of large companies than female CEOs of large companies. There is little if any difference in the average of these traits between the sexes. There is substantial difference in the variation of these traits between the sexes, though.

In other areas, averages matter. For instance, the average IQ of American whites from the south-eastern United States is lower than the average IQ of American whites from the northern states. From this you might wonder if large companies have a disproportionately small number of CEOs from the American South, while white southerners have responded to this “dixie ceiling” by organizing politically to obtain political goods that they cannot gain in the marketplace.

I have never seen anyone talk, in a popular setting, about a comparison between a variation on the one hand and an average on the other. Typically one or the other is relevant to the conversation, and bizarre second-order comparisons (what is the variability in height of Australians compared t the average height of South Americans) are simply uninterpretable. But if you’ve never worked with variation as a real thing (through calculating a standard deviation to solve a problem, say), the remnant of the fallacy is a good-guess by an ignorant laymen of what Lewontin may have been talking about.

Conclusion

The phrase “Variation between human races is greater than variation within human races” is meaningless. It either refers to an empirical incorrect claim from the 1970s, on the impossibility of using “blood proteins” to predict race, or an incoherent claim that compares averages against variation.

# The Language of Theory, or, How to Escape the Humanities Ghetto

This morning I read an article by Patrick Thaddeus Jackson and Daniel Nexon, titled “Paradigmatic Faults in International-Relations Theory.” This piece originally appeared in a 2009 edition of Internaionl Studies Quartlerly.

I like when people agree with me, so when I saw my words echoed across time (it’s as if Jackson and Nexon read my post, built a time machine, and told their former selves what a great idea they read on tdaxp). Yesterday, I said it was riduclous to describe the International Relations cliques of “Realism,” “Liberalism,” and such as paradigms. I wrote:

The highlighted passage, originally by Daniel Maliniak simply means that empirical research is increasing, and that non-empirical research is declining, within political science. But Maliniak, and thus Walt and Mearsheimer, bizarrely use â€œparadigmaticâ€ to refer to less paradigmatic (that is, less capable of progress) fields, and â€œnon-paradigmaticâ€ to more to more paradigmatic (that is, more capable of progress) fields.

Political science has been in the fever swamp for so long that the notion of progress as an outcome of normal science has almost entirely been lost. If Walt and Mearsheimer had their way, it might be lost, and the field simply divided into a stationary oligarchy of old boys network.

As Jackson and Nexon write:

The terminology of â€˜â€˜paradigmsâ€™â€™ and â€˜â€˜research programmesâ€™â€™ produces a num-ber of deleterious effects in the ï¬eld. It implies that we need to appeal to criteria of the kind found in MSRP in order to adjudicate disputes that require no such procedures. In order to do so, we spend a great deal of time specifying the â€˜â€˜boundariesâ€™â€™ of putative research programmes and, in effect, unfairly and misleadingly holding scholars accountable for the status of theories they often view as rivals to their own.

Perhaps the most well-known instance of this kind of boundary-demarcation occurs in the debates surrounding â€˜â€˜realismâ€™â€™ in international relations theory. The proliferation of countless lists of the â€˜â€˜core commitmentsâ€™â€™ of a realist â€˜â€˜paradigmâ€™â€™â€”by adherents and critics alikeâ€”shifts the focus of scholarship away from any actual investigation of whether these commitments give us meaningful leverage on the phenomenal world, and instead promotes endless border skirmishes about who is and is not a realist (Legro and Moravcsik 1999), whether predictions of balancing are central to the â€˜â€˜realist paradigmâ€™â€™ (Vasquez 1998:261â€“65), and so forth. Such debates and demarcations not only distract us from the actual study of world politics, but also harm disputes over international relations theory by solidifying stances that ought to remain open to debate and discussion.

So I enjoyed Jackson’s and Nexon’s takedown of the so-called “paradigms” in International Relations.

But they don’t go far enough.

Their piece ends with an appeal to Max Weber (how non-progressive can you get?!?) and an unfalsifiable taxonomy that I won’t go into

A more useful conclusion to the paper would have been to recognize that statistics is the language of theory, the language of modeling. Instead of inviting international relations scholars to chase their own tale and bow to Max Weber and the dead, how much more useful would a positive theory of research programs in International Relations have been? For instance, consider a citation indexing method, such as PageRank [pdf] to determine if they are “clusters” PageRank sets in which certain articles were influential (exemplars?) and others were not. Did Jackson and Nexon really have no one availability to sketch even a proposed methodology for testing their claim?

The answer is probably “no.” My purpose isn’t to pick on Jackson and Nexon, but to point out the weakness of International Relations as a whole. In a related post by Patrick Musgrave, titled “The Crass Argument for Teaching More Math In Poli Sci Courses“, the following diagram showing is shown:

Which clearly displays a “humanities ghetto,” that includes political science.

How can this be, if International Relations is the disciplined extraction of meaning from data, which is the same focus as the high-paying, well-employed fields?

The obvious answer is that International Relations does not teach actually useful methods for the disciplined extraction of data. It does not teach critical thinking or logical reasoning. It teaches something that apes these skills, a rhetorical ability that impresses old scholars and does not help society.

International Relations is a non-progressive field where, by and large, it sucks to be young.

In an evocative comment that ties the article and the blog post together, Patrick Thaddeus Jackson states:

I don’t think that it is our job as university faculty to increase students’ future earning potential. Nor do I think that it is our job in teaching PoliSci undergrads to make sure that they can read APSR in the 1980s and 1990s. Our job is to teach students to think critically about politics, and while I am perfectly fine with the suggestion that some statistical literacy can be useful to that end, I am not prepared to give that higher pride of place than things like reading closely, writing cogently, and disagreeing with one another civilly.

The dichotomy that Jackson notes is entirely false. In his own piece, he was not able to express a constructive critical thought about paradigms — the original Nexon and Jackson article is devoid of the model specification or operationalization that would needed to turn hisÂ criticisms and taxonomy into something capable of progress. Any competent graduate from the humanities ghetto can read “closely” or write “cogently.” That’s needed is to think usefully, and for this statistical literary is required.

# Go-Up or Go-Next?

The idea that there are two cultures in academic life, a culture focused on the humanities and another on science, is not a new one. The famous “Two Cultures” lecture is more than fifty years old, and Brother Guy Consolmagno identifies instances of the two cultures in medieval Catholic Europe in his book of adventures.

Jason Lee Steorts, a writer for the National Review Online, defended NRO’s dismissal of John Derbyshire, demonstrates that by criticizing Derbyshire’s controversial article for being hypocritical. In 2012, Derbyshire writes in paragraph 4:

The default principle in everyday personal encounters is, that as a fellow citizen, with the same rights and obligations as yourself, any individual black is entitled to the same courtesies you would extend to a nonblack citizen. That is basic good manners and good citizenship. In some unusual circumstances, howeverâ€”e.g., paragraph (10h) belowâ€”this default principle should be overridden by considerations of personal safety.

While two years previously, in a speech on race relations, he said

Group differences are statistical truths. They exist in an abstract realm quite far removed from our everyday personal experience. They tell you nothing about the person you just met.

This would be hypocracy, unless you believe the fundemental principles of statistics have undergone a revolution in the past two years. Which, of course, they have.

There are two ways of understanding statistics. The terms “Frequentist” and “Probabilistic” are thrown around here, but to me those words are more confusing than helpful. So I will call them the go-up and go-next views of statistics.

The Go-Up view of statistics is that statistics measures the population from which an observation comes from. The appropriate way to go-up is to wait until you have a sufficient number of observations. and then generalize about the population from our observations. This is the method that Derbyshire was describing in 2010. A large number of observations of academic performance show consistent gaps between black and white learners. Because we’re “going-up” from observations to populations, we can conclude some things about the population, and how outcomes in the population should work-out over all, but it makes no sense to try to predict any given student’s success based on this. We’re going-up, not going-next.

The Go-Next view of statistics is that statistics gives us the likelihood of something being true, based on what has come before. In Go-Next statistics, population-averages are besides the point. What matters is guessing what’s going to happen, next, based on what you’ve seen before. The whole point is to guess what’s going to work for individuals you know only a few things about, based on your experience with other individuals who shared some things with the new strangers.

Both the Go-Up and Go-Next interpretations of statistics are hundreds of years old. Go-Up statisitcs strikes many as more beautiful. Go-Next as, perhaps, more practical, more commercial, more technical. Astronomers use go-up statistics. Weathermen use go-next statistics.

The Internet changed everything.

Academics pay attention to reality. Professors, like most people, respond to the incentives of power, influence, and money. Companies like Google, Facebook, Apple, and my employer do not care much about abstract ideas like “What can we infer about internet users in general based on the observations we collect.” Instead, they care, very, very deeply, about making you delighted. Because people will spend money to be delighted.

When you log onto your Facebook screen, or type a search into Google, or click the genius buttons in iTunes, you want it to just work. You want the perfect update, the perfect site, the perfect song. Advertisers want the perfect ad for you.

In this context, the view of statistics that Derbyshire outlined in 2010:

Group differences are statistical truths. They exist in an abstract realm quite far removed from our everyday personal experience. They tell you nothing about the person you just met.

Is just stupid. Facebook doesn’t care about the group differences between men and women. It cares that when you log in, it can give you an update from your favorite sports team, or gossip from your favorite celebrity, or whatever. Never before in history has so much math been used to make you happy.

It’s all about guessing, based on what has come before, what’s best for you.

It’s all about guessing, based on prior observations, who you are, what you will do, and what you will like.

These major companies have been hiring those with statistical literacy very heavily for more than a decade. Professors, who seek, money, fame, and power, know what these large potential sources of money, fame, and power want, and teach their products — their students –accordingly.

The superstructure of science changes as the infrastructure of the economy changes. The Go-Next philosophy of statistics, once the peasant stepchild of the serene Go-Up interpretation, now reigns supreme.

The unfolding victory of Go-Next Statistics matters much, much more than, say, the Copernican Revolution. The number of people whose daily conversations were actually impacted by Copernicus may have been a few dozen, all involved in the Papal-Academic complex.

How many times a day does Facebook’s decision of which news to share impact you?

How many times a day does Google’s decision of what sites to show impact you?

How many times a day does your iPod’s decision of what music to play impact you?

Now, back to Derbyshire.

Mr. Derbyshire was born in 1945. His training is in Go-Up statistics. It took a complete revolution in statistics to change his view of it. That view clearly changed in the last 2 years.

We’ve all lived thru the revolution of Go-Next statistics. Derbyshire realizes it. Steorts, clearly, does not.

There are two cultures of knowledge, the humanities and the sciences. Part of Derbyshire’s intention of writing “The Talk – Nonblack Version” appears to have been to highlight this. If so, I think he succeeded.

# The R Statistics Language

R (also called GNU R, or even GNU S) is the open-source version of the S Programming Language, a language which fulfills the same statistical needs as SAS and SPSS. While SAS is a macro language designed for statistics, and SPSS is a macro language designed for statistics with a very nice graphical front-end, R looks like dialects ot C, acts like a dialect of LISP, and function as nifty alternative to SPSS and SAS. As I come from a programming background, R is beautiful in concept.

By far the coolest part of R and PCA is learning what unknown unknowns you forgot to solve for. For instance, a bundle of seemingly meaningless data can be examined through a ‘scree plot,’ to see which things you forgot to measure for (‘latent variables’) mattered, and which did not.

Unknown Unknowns? That’s the R

# Avandia has a moderate-to-very-large practical effect on heart failure

Home, P.D., et al. Rosiglitazone evaluated for cardiovascular outcomes — an interim analysis. The New England Journal of Medicine. 5 June 2007. Available online:http://content.nejm.org/cgi/content/full/NEJMoa073394 (via Medical News Today).

Avandia is a drug designed to treat Type II Diabetes. Type 2 Diabetes leads to heart attack, death, and a lot of other bad things. A safe drug that treats it would be very good. Many people think that Avandia (rosiglitazone maleate) is that drug. However, a recent article in The New England Journal of Medicine reported that Avandia has a large-to-very-large effect on patient death. Because this is important news, a new article was rushed to the New England Journal that reported results-so-far of a study that’s not completed.

The results section is statistics-y:

Because the mean follow-up was only 3.75 years, our interim analysis had limited statistical power to detect treatment differences. A total of 217 patients in the rosiglitazone group and 202 patients in the control group had the adjudicated primary end point (hazard ratio, 1.08; 95% confidence interval [CI], 0.89 to 1.31). After the inclusion of end points pending adjudication, the hazard ratio was 1.11 (95% CI, 0.93 to 1.32). There were no statistically significant differences between the rosiglitazone group and the control group regarding myocardial infarction and death from cardiovascular causes or any cause. There were more patients with heart failure in the rosiglitazone group than in the control group (hazard ratio, 2.15; 95% CI, 1.30 to 3.57).

Several results are reported here. The most important to consider are practical signifiance and statistical significance . From my statistics notes:

Statistical significance is concerned with whether an observed mean difference could likely be due to sampling error
Practical significance is concerned with whether an observed effect is large enough to be useful in the real world

For instance, imagine that you wish to be more productive, so you buy a new computer . You notice that you get twice as much done in an hour with the computer than without it. The practical significance would be very large (double!). However, you didn’t look at enough people to reject the notion that maybe it was just a fluke. So there would not be statistical significance.

A similar thing happened in this study. The last part of the quoted paragraph (“hazard ratio, 2.15”) means that, practically speaking, for every heart attack for diabetes type 2 patients who aren’t taking Avandia, patients taking Avandia have 2.15 heart attacks. However, the study did not meet statistical significance — the new research did not look at enough people to say whether or not this very large practical effect was due to chance or not.

A problem with the study — that the authors note — is that they are reporting their results too soon. (They are doing this because there is talk of forcing Avandia off the market, which would effect all patients who currently take Avandia and obviously hurt GlaxoSmithKline, the company that makes it.) I have heard anecdotes that one of the side-effects of Avandia is “preamature-aging.” If this is true, the negative effects of Avandia would get worse and worse over time. Thus, future research may go from the current two (where all find practical significance, but only one finds statistical significance) to a situation where all find statistical significance.