The R Statistics Language
by tdaxp ~ September 20th, 2008
R (also called GNU R, or even GNU S) is the open-source version of the S Programming Language, a language which fulfills the same statistical needs as SAS and SPSS. While SAS is a macro language designed for statistics, and SPSS is a macro language designed for statistics with a very nice graphical front-end, R looks like dialects ot C, acts like a dialect of LISP, and function as nifty alternative to SPSS and SAS. As I come from a programming background, R is as beautiful in concept as Lady of tdaxp is in form.
R’s learning curve is steep. If perl tries to make ‘impossible things hard and hard things easy,’ then R’s philosophy seems to be ‘make hard things easy and easy things hard.’ Some procedures that are complex and tedious in SPSS and R, such as taking the inverse of a matrix by the loadings of its correlation matrix as determined bya one-factor Principal Component Analysis, or PCA (in that case, it would be solve(ad.data.cor) %*% as.matrix(principal(ad.data.cor,nfactor=1)$loadings). Other tasks are requirer a deepper understanding of the material, however. For example, in SPSS creating a ‘Component Score Coefficient Matrix’ after a PCA is as simple as ticking a check box, or adding a simple request in the macro code. In R, you need to realize that the Component Score Coefficient Matrix is actually just the inverse of a matrix multiplied by the loadings of the matrix after running it through PCA: so you’d enter the line solve(ad.data.cor) %*% as.matrix(principal(ad.data.cor,nfactor=1)$loadings).
By far the coolest part of R and PCA is learning what unknown unknowns you forgot to solve for. For instance, a bundle of seemingly meaningless data can be examined through a ’scree plot,’ to see which things you forgot to measure for (’latent variables’) mattered, and which did not.
Unknown Unknowns? That’s the R



September 21st, 2008 at 1:52 am
Since I don’t think you’ll get many replies to this post, I thought you might be interested in this article from edgedotcom.
Its titled: What makes people vote Republican?
It was written by a psychologist and he attempts to psychoanalyze GOP voters. There’s also a discussion after the article with much anti-GOP snobbery.
September 21st, 2008 at 1:53 am
Here’s the link:
http://www.edge.org/3rd_culture/haidt08/haidt08_index.html
September 21st, 2008 at 4:13 pm
I enjoy the technical posts, tdaxp. I’ll look into tinkering with R, my interest being piqued by the reference to it acting like a dialect of LISP. (Been doing declarative programming in past months.)
But to my real question: is Lady of tdaxp more C (imperative) or LISP (declarative)? ;-\
September 22nd, 2008 at 12:49 pm
LOL!
Great comments, on this un-comment-able post!
Seerov,
The post makes me think of Cliodynamics, the attempt to build a science of history. [1] It also makes me think of the finding that conservatives can more easily engage in concerted collective action. [2]
Moon,
I am taking two statistics classes this semester, and I am trying to follow along in both of them with R. I find it helps my understanding a lot… it forces me to go over parts of the work several times, and also makes me understand conceptually what a statistical function in SPSS or SAS actually does, when I need to recode it in R.
[1] http://www.gnxp.com/blog/2008/08/cliodynamics-rise-fall-of-empires-and.php
[2] http://scienceblogs.com/gnxp/2008/09/conservatives_have_more_fear.php#more
September 22nd, 2008 at 5:07 pm
Are you using R because it’s required, or as a masochistic attempt to learn a new software?
September 22nd, 2008 at 5:43 pm
“Great comments, on this un-comment-able post!” (Dan)
Its not that I don’t find this post interesting, I just don’t have much to add. I’ve worked with SPSS and have also done statistics on excel, but I don’t know much about programming and can’t add anything of much value to the discussion. I figured this would be one of your posts where there was little in reader replies?
September 23rd, 2008 at 8:19 am
Seerov,
Haha! Only having fun!
I’ve never figured out the secret of which posts are commented, and which are not. (Perhaps I could use R to find out?!
)
September 26th, 2008 at 6:48 am
Michael,
Because it’s free, and a subscription to SPSS costs $200!
If I have to go out of my way to do something (either learn R, or do my statistics in a university lab), I’d rather walk in the way that teaches me more.
R certainly does.
September 26th, 2008 at 3:40 pm
How have pre-packaged software like Minitab or Statistica worked out for you?
September 27th, 2008 at 1:42 pm
Michael,
Negative. Some folks here use Minitab, but again if I’m going out of my way, I want to learn deeply.
PS: Your comment went through straight-away, without being flagged by Akismet! Congratulations!