Tag Archives: r

The R Statistics Language

R (also called GNU R, or even GNU S) is the open-source version of the S Programming Language, a language which fulfills the same statistical needs as SAS and SPSS. While SAS is a macro language designed for statistics, and SPSS is a macro language designed for statistics with a very nice graphical front-end, R looks like dialects ot C, acts like a dialect of LISP, and function as nifty alternative to SPSS and SAS. As I come from a programming background, R is beautiful in concept.

R’s learning curve is steep. If perl tries to make ‘impossible things hard and hard things easy,’ then R’s philosophy seems to be ‘make hard things easy and easy things hard.’ Some procedures that are complex and tedious in SPSS and R, such as taking the inverse of a matrix by the loadings of its correlation matrix as determined bya one-factor Principal Component Analysis, or PCA (in that case, it would be solve(ad.data.cor) %*% as.matrix(principal(ad.data.cor,nfactor=1)$loadings). Other tasks are requirer a deepper understanding of the material, however. For example, in SPSS creating a ‘Component Score Coefficient Matrix’ after a PCA is as simple as ticking a check box, or adding a simple request in the macro code. In R, you need to realize that the Component Score Coefficient Matrix is actually just the inverse of a matrix multiplied by the loadings of the matrix after running it through PCA: so you’d enter the line solve(ad.data.cor) %*% as.matrix(principal(ad.data.cor,nfactor=1)$loadings).

By far the coolest part of R and PCA is learning what unknown unknowns you forgot to solve for. For instance, a bundle of seemingly meaningless data can be examined through a ‘scree plot,’ to see which things you forgot to measure for (‘latent variables’) mattered, and which did not.

Unknown Unknowns? That’s the R

Feeling Pretty l33t

So for multivariate analysis a lot of the examples are designed for SPSS, a common statistical package. Sadly, I no longer have a valid license to use SPSS, and PSPP (the open source version) is missing the functionality that I need. (PSPP doesn’t work on windows without cygwin, or well at all unless under linux, hence my use of Sun’s VirtualBox.) Luckily, I was able to download R (the opensource version of S+), and do the work in that. Meanwhile, the Google Chrome browser is pretty good (I’m using it for most tasks now, but won’t be switching anyone else over) and I’m listening to Dr. Horrible’s Sing Along Blog while doing it all