My friend Adam Elkus recently asked what made Structural Equation Modeling (SEM) powerful for testing theories, besides the ability to test for null results.
This is my answer.
What I like about SEM is that it allows models to be created that better reflect theories than any other method I know. Other methods introduce a greater source of unmeasurable error — model error — than SEM, because those methods force you take your theories, translate them into another form, and then test those.
Take the example theory (which is crazy):
â€œWhile democracy exhibits substantial inertia — more democratic places stay more democratic, less democratic places stay less democratic — communication technology forces us to reshape our understandings of how democracy grows or declines. Within any community, the growth or decline in the strength of democratic institutions mediates outside international pressure entirely through smartphone connectivity.
“By strength of democratic institutions I mean such tings as average turn-over of political offices, number of political questions per year voters are asked to consider, the percentage of major editorials that our critical of government policy. By international pressure I mean UN resolutions that mention a country, statements by foreign ministers that reference a country, and number of applications for McDonalds franchise that were rejected. By smartphone connectivity I mean the fraction of the population that has smart-phones, the average number of web impressions per person to Wikipedia, and the average number of hours per day individuals spend playing Angry Birds.â€
OK, letâ€™s create the SEM for it. The generic measurement model for democracy, for time0 and time1 is (converted to a pseudo-Mplus language)
LATENT democracy0 (float);
MANIFEST democracy0 ONTO politicalTurnover0 (float); // [0...1]
MANIFEST democracy0 ONTO politicalQuestions0 (int); // [0...n]
MANIFEST democracy0 ONTO criticalEditorials0 (int); // [0..n]
LATENT democracy1 (float);
MANIFEST democracy1 ONTO politicalTurnover0 (float); // [0...1]
MANIFEST democracy1 ONTO politicalQuestions0 (int); // [0...n]
MANIFEST democracy1 ONTO criticalEditorials0 (int); // [0..n]
// … And
LATENT smartphoneConnectivity (float);
MANFIEST smartphoneConnectivity ONTO ownershipRate (float); // [0...1]
MANIFEST smartphoneConnectivity ONTO wikipediaRate (float); // [0...n]
MANIFEST smartphoneConnectivity ONTO angryBirds(float); // [0...24]
LATENT internationalPressure (float);
MANIFEST internationalPressure ONTO unResolutions(int); // [0...n]
MANIFEST internationalPressure ONTO fmCriticisms(int); // [0...n]
MANIFEST internationalPressure ONTO mcRejections(int); // [0...n]
// no time0, because weâ€™re assuming that smartphoneConnectivity0 completely mediates the change in democratic trajectory, that isnâ€™t the result of inertia
// OK, now weâ€™d create our latent model
democracy0 LOADS ON democracy1; // the inertia of democracy
internationalPressure LOADS ONTO smartphoneConnectivity; // … smartphones mediate international pressure..
smartphoneConnectivity LOADS ONTO democracy1; // .. onto democracy
Because SEM allows us to so faithfully translate the model of our theories into the model of code, we now have a serious question that isnâ€™t obvious from the paragraph, but is more obvious when weâ€™re in he process of writing down
Because you see your theory in â€œcodeâ€ (or matrices if you insist on the algebraic way to do this, which I’ve only used for class), it makes assumptions or mistake jump our more. Like in the first draft of this email I said that smartphones mediate onto democracy, but I didnâ€™t define what they mediated — hence the inclusion of international Pressure.
(A more common format is for a left to right flow of manifest predictor indicators, latent predictor factors, latent outcome factors, and manifest outcome indicators. The format above is chosen to fit well on my blog page, and impatience in taking time to make it look better.)
As I said, the theory’s crazy — but SEM allows that theory to be translated into a model that can be directly tested, and frees us from having to waste our time with hacks like ANOVA, multiple regression, or dead theory disconnected from reality.