OTHER PLACES OF INTEREST
Danny Flamberg's Blog
Danny has been marketing for a while, and his articles and work reflect great understanding of data driven marketing.
Eric Peterson the Demystifier
Eric gets metrics, analytics, interactive, and the real world. His advice is worth taking...
Geeking with Greg
Greg Linden created Amazon's recommendation system, so imagine what can write about...
Ned Batchelder's Blog
Ned just finds and writes interesting things. I don't know how he does it.
R at LoyaltyMatrix
Jim Porzak tells of his real-life use of R for marketing analysis.
HOW DID YOU GET HERE?
(First off, if you found this page via a web search or bookmark, you may be much happier in the R Section of this site to see the multiple articles about R, including this one, but also about Packages, Data manipulation, etc.)
Packages are bundles of additional functionality. They can be analyses, datasets, or just tools. For the unix side, they come as source code and get compiled on your system. For windows, the R team has pre-compiled many of them, but sometimes they don’t work. (All together now: It’s open source. Get over it.)
Like CPAN is the home of all add-ins for Perl, CRAN is the home for all add-ins (packages) for R. While there are a few here and there not mirrored on CRAN... assume CRAN is the best place to start.
What’s on CRAN? Check out http://cran.r-project.org/bin/windows/contrib/checkSummaryWin.html
library() lists what’s available on your current box
search() lists what’s loaded
library(packagename) loads it in
detach("package:packagename") unloads it
Adding a package? First you have to get it, using
install.packages(name); note the plural. Then, to activate it, use JGR’s package manager, or type the commands below; the path below is your default library dumping ground, but feel free to put your favorite path. Remember, the names are case sensitive, so Hmisc needs to be spelled this way.
Don’t forget to either
.refreshHelpFiles() to refresh the help files and indexes; most of the packages include some these days.
You can set your nearest CRAN to be the default for your session (or put it in a startup file):
options(CRAN = "http://cran.us.r-project.org/")
Then simply say
install.packages("foo") or install.packages("foo")
If you’ve already downloaded the zip with the binary package for Windows, then argument
pkgs can also be a character vector of file names of zip files if CRAN=NULL. The zip files are then unpacked directly.
Packages can be removed in a number of ways. From a command prompt they can be removed by just deleting the package directory, or
lib = file.path(“path”, “to”, “library”))
as in :
Or just delete the directory. I have no idea if the help files are properly removed as well; perhaps run the refresh commands mentioned above to remove the un-needed help files.
summary(packageStatus()) lets you see what is new and not.
update.packages() walks through each new one to let you upgrade it.
|Boot||=||Bootstrap functions, including some sample data|
|Class||=||Classification, very handy, including k-nearest-neighbor and SOMs|
|Cluster||=||Cluster analysis including plots, Clara/Diana/Agnes large data techniques|
|Datasets||=||Tons of datasets for sample analyses|
|Foreign||=||translators for Minitab, SPSS, S3, SAS, DBF, etc.|
|Graphics||=||all the basic plots and some clever ones; Lattice has more advanced ones|
|grDevices||=||Control over graphic display devices|
|Grid||=||Low level graphics control, underlies Lattice|
|KernSmooth||=||Kernel Smoothing algorithms (Kernel Density Estimate, etc)|
|Lattice||=||Powerful visualization package, similar to the Trellis package from S-Plus; requires Grid package|
|MASS||=||Venables and Ripley’s MASS, including datasets, analyses, and examples linked to their book. Lots of good “utility” analyses here.|
|Methods||=||Package to deal with R internals and programming|
|mgcv||=||GAMs with GCV smoothness estimation and GAMMs by REML/PQL = General Additive Models|
|nlme||=||Linear and nonlinear mixed effects models|
|nnet||=||Feed-forward Neural Networks and Multinomial Log-Linear Models, handy for categorical data analysis|
|rpart||=||Recursive Partitioning and Tree building. Handy for categorical analysis.|
|spatial||=||Kriging and Point Pattern Analysis. I have no idea what this does, so worth investigating. I assume its a geo-spatial analysis approach|
|splines||=||Regression Spline Functions and Classes|
|stats||=||All the stats you ever wanted, from Anovas to weighted means, and lots of stuff inbetween.|
|stats4||=||Statistical functions using S4 classes. Looks like wrappers around the more advanced stat calculations|
|survival||=||Survival analysis (Cox model, etc.), including penalised likelihood. Useful for decay analyses. Includes some sample data|
|tcltk||=||Tcl/Tk Interface, a gui popular on unix but less accessible on windows (hence the drive towards JGR and other “more cross platformy” approaches)|
|tools||=||a mixture of random stuff, more useful for R programmers than users|
|utils||=||a mixture of random stuff, but actually handy things. Worth reviewing the list of things here for quick saves.|
VCD, “Visualizing Categorical Data” has been mentioned as a great package for data viz.
http://www.rosuda.org/R/ has the JGR packages
Finally, for the ever popular clustering of binary data:
For distance based clustering methods see
It consists of a thin layer over the R packages RSQLite and RMySQL. (The code for accessing RSQLite has been tested but the code for accessing RMySQL has only been partly tested and only in the development version of sqldf). More information can be found from within R by installing and loading the sqldf package and then entering ?sqldf. A number of examples are at the end of this page and more examples are accessible from within R in the examples section of the ?sqldf help page.
So, for those times when you know exactly how the transform should go in SQL, but you don’t know all the R tricks to get it there… sqldf.
Another good one: sqlitedf
http://cran.r-project.org/web/packages/SQLiteDF/index.html and http://code.google.com/p/sqlitedf/. Basically, this replaces your in-memory dataframe with a SQLLite backed version, allowing much larger data. As G. Grothendieck, the author of SqlDF, pointed out in a comment, this doesn’t give you access to SQL itself, but can help you deal with larger datasets while staying in an R context and syntax.
Update: 2/6/2008: FF is a very exciting package that got its first big show at the 2007 user conference.
The ff package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files What’s great about it is that it appears to work without changing lots of R’s insides.
Andy Edmonds on the Web Analytics Group suggested highlighting the ODBC and SQLite connectors. Getting data in and out of databases and other tools is pretty important. Did you know you can control Excel through ODBC? And SQLite is a very small database that that you can use when you just gotta do something in sql that you can’t do easily in R (multi-dataframe joins, etc.). rodbc and rsqlite are good places to start.
* * *