The R Statistics Languageon September 20, 2008 at 6:52 pm
R (also called GNU R, or even GNU S) is the open-source version of the S Programming Language, a language which fulfills the same statistical needs as SAS and SPSS. While SAS is a macro language designed for statistics, and SPSS is a macro language designed for statistics with a very nice graphical front-end, R looks like dialects ot C, acts like a dialect of LISP, and function as nifty alternative to SPSS and SAS. As I come from a programming background, R is as beautiful in concept as Lady of tdaxp is in form.
R’s learning curve is steep. If perl tries to make ‘impossible things hard and hard things easy,’ then R’s philosophy seems to be ‘make hard things easy and easy things hard.’ Some procedures that are complex and tedious in SPSS and R, such as taking the inverse of a matrix by the loadings of its correlation matrix as determined bya one-factor Principal Component Analysis, or PCA (in that case, it would be solve(ad.data.cor) %*% as.matrix(principal(ad.data.cor,nfactor=1)$loadings). Other tasks are requirer a deepper understanding of the material, however. For example, in SPSS creating a ‘Component Score Coefficient Matrix’ after a PCA is as simple as ticking a check box, or adding a simple request in the macro code. In R, you need to realize that the Component Score Coefficient Matrix is actually just the inverse of a matrix multiplied by the loadings of the matrix after running it through PCA: so you’d enter the line solve(ad.data.cor) %*% as.matrix(principal(ad.data.cor,nfactor=1)$loadings).
By far the coolest part of R and PCA is learning what unknown unknowns you forgot to solve for. For instance, a bundle of seemingly meaningless data can be examined through a ‘scree plot,’ to see which things you forgot to measure for (‘latent variables’) mattered, and which did not.
Unknown Unknowns? That’s the R