Deprecated: Function set_magic_quotes_runtime() is deprecated in /home/mwexler/public_html/tp/textpattern/lib/txplib_db.php on line 14
The Net Takeaway: Understanding R

OTHER PLACES OF INTEREST

Danny Flamberg's Blog
Danny has been marketing for a while, and his articles and work reflect great understanding of data driven marketing.

Eric Peterson the Demystifier
Eric gets metrics, analytics, interactive, and the real world. His advice is worth taking...

Geeking with Greg
Greg Linden created Amazon's recommendation system, so imagine what can write about...

Ned Batchelder's Blog
Ned just finds and writes interesting things. I don't know how he does it.

R at LoyaltyMatrix
Jim Porzak tells of his real-life use of R for marketing analysis.

 

HOW DID YOU GET HERE?

baidu.com
nettakeaway.com
baidu.com
https:
google.com
google.com
nettakeaway.com
baidu.com
https:
https:

 

 

 

Understanding R · 02/27/2007 05:44 PM, Analysis

R Notes
http://www.r-project.org/
R Wiki

(If you’ve found this via searching, you may enjoy the entire series of R articles, found via the navigation link on the right, R Statistical System. These are all in “somewhat random notes” style, but they’ve been helpful to me in the past. Feel free to ping me with updates or suggestions.)

Introduction

Ok, why all this? I wanted to pull together notes for folks who are pretty savvy and need to understand the quirks of R. Yes, there are lots of docs out there (I link to some of the better ones below) and yes, these notes aren’t always well organized… but they try to focus on the “getting the job done” parts, not the stats-tutorial or stats-programmer approach most other docs take.

Oh, the “official” docs? They stink for beginners or folks with little time on their hands to trudge through them. http://cran.r-project.org/manuals.html are the “official” ones, and http://cran.r-project.org/other-docs.html are the contributed ones by users trying to make it better (but still, tough slogging ahead).

As you get more into R, these will all become more useful, but don’t worry if you get stuck on these in the early days: they are all pretty technical, both programming-wise and computational-statistics wise. But if you want to see how to work R like the masters, this is a good place to dig.

Best Manuals:
None of them are great, but these are the best of what I’ve seen in my reading. (I will expand and review them all later)
Using R for Data Analysis and Graphics with Introduction, Code and Commentary by J H Maindonald Updated Version which fed into much of this guide, credit to Maindonald!
Verzani-SimpleR.pdf has some very nice graphs, and also explains how to read the output, which is very helpful.

(There is also a Using R PDF which is the older Maindonald book, linked for historical reasons)

Quicker but still pretty good: Notes on the use of R for psychology experiments and questionnaires by Jonathan Baron and Yuelin Li

Fantastic guide for those who are very familiar with SAS and SPSS:
R for SAS & SPSS users PDF at the Author’s own site, http://rforsasandspssusers.com/

There are many contributed docs to R, and you can click here to see docs sorted by most recently updated. CRAN Other Documentation shows some commentary around some of the more general tutorials and docs, some of which I reference above.

GREAT list of tips:
RTips, aka StatsRUs. These are slowly migrating to the R Wiki Tips Section so check there also.

Don’t ever forget about the general R FAQ and the Windows specific R FAQ.

Another place to dig is in the R Mailing lists but as I post elsewhere, folks are not nice to newbies. Be prepared to slog through some really rude responses by people who don’t remember what it was like when they were just starting out. Also, the mantra to remember: It’s Open Source, get used to it.

The Most Important Things to Know about R
I will assume you’ve used other programs such as SPSS, Mystat or even Excel. I also assume Windows. Note that R will often top out at 100k rows and 20 variables b/c it stores everything in memory, depending on the type of data you have. You may do better with Linux than with Windows; Linux has a better memory model for the big stuff. Yes, Windows will top out at 2gb until we get to 64-bit, but if you are adventurous, the R for Windows FAQ has some hideously complex suggestions that none of us can easily do to potentially work around this issue (there is more about this near the bottom of the post).

There are 4 things to remember in working in R:

  1. Everything is an object. This means that your variables are objects, but so are output from analyses. Everything that can possibly be an object by some stretch of the imagination… is an object.

  2. R works in columns, not rows. We normally think of data as 1 line per person (or observation), with a collection of variables recorded per person. But R thinks of variables first, and when you line them up as columns, then you have your dataset. Even though it seems fine in theory (we analyze variables, not rows), it becomes annoying when you have to jump through hoops to pull out specific rows of data with all variables.

  3. R likes lists. If you aren’t sure how to give data to an R function, assume it will be something like this: c("item 1", "item 2") meaning “concatenate into a list the 2 objects named Item 1, Item 2”. Also, “list” is different to R from “vector” and “matrix” and “dataframe” etc. ad nauseum. But beyond the “specific meaning” aspects which you can deal with later, you get the idea.

  4. It is open source. It won’t work the way you want. It has far too many commands instead of optimizing a core set. It has multiple ways to do things, none of them really complete. People on the mailing lists revel in their power over complexity, lack of patience, and complete inability to forgive a novice. We just have to get used to it, grit our teeth, and help them become better people.

GUI
There aren’t many good ones. Some things which help me keep my sanity includes the pretty good start of JGR which still has bugs but includes a “data editor” similar to a spreadsheet, and color syntax highlighting and command tooltips to help your syntax. In a later entry entitled R GUIs, I review a few guis, web-front-ends, and editors.

BTW, in JGR: To run part of your syntax file when open in the handy editor, select it and <Cmd><Enter> on the Mac, or <Ctrl><Enter> on PC. No docs on this, but that’s how it’s done. Also, to get to the data editor, use the Object Explorer and then double-click on your data frame… voila. the Edit command doesn’t work as of this posting. There are more tips for JGR in the R GUIs entry here.

Help
There is online help, but its hard. help() is your starting place. help(plot) gives help for the plot command. help.search(plot) is as expected, and apropos(plot) lists all functions with the word plot in their name. Note that some of this help is aimed at programmers, not those of use who need to know how to get something done.

help.start() will pop up a more menu driven approach, but still not all that helpful. Basically, help.start() starts the browser version of the help files.

example(command1) prints an example of the use of the command. This is especially useful for graphics commands. Try, for example, example(contour), example(dotchart), example(image), and example(persp).

DATA

Basically, everything in R is an object. Assume an object is either a number or word, a collection of numbers/words, or the results of a procedure. BTW: identifiers are Case Sensitive! Comments are prefixed by the # character, a line at a time (meaning you need the # on each line).

Most stats packages lay the data in a simple fashion: column heads are variable names, and each row is a new entry. R turns this on its head a bit. You start off with columns of numbers (i.e., each variable on its own) and you merge them into a “data frame”, akin to SAS’s dataset or SPSS’s datafile.

Yes, this is annoying. It’s open-source; get used to it.

BTW: q() is the quit command. Why not just quit? Ya got me. Altogether now: It’s open source; get used to it. Anyway… q("yes") saves everything.

If you just want to “clear the decks”, consider rm(list=ls()) which deletes all objects.

Some tips on reading and manipulating data are in this PDF.

Getting data into the system…

myDataFrame <- read.table("c:/austpop.txt", header=T)

As you’d expect, you can play with what delimiter is used (sep=), and here, the first line are the headers and are read as the column names. (There are always MTOWTDI like PERL. For example, there is also read.csv(), etc. Check your manuals!)

Not all of these wind up in a data frame format; its pretty specific. If you aren’t sure, just force it:
mydata<-read.csv('C:/data.csv'); mydf<-as.data.frame(mydata) ## data as dataframe
(yes, a semicolon can separate multiple commands.)

Typing in data:
Pretty similar. In this case, we have a “c” function which combines numbers into a “column”. t1 <- c(1,2,3,4,5)

There is a mini “spreadsheet” for editing and adding data if you wish to raw-type:
xnew <- edit(data.frame())

If you have a couple of these, then you can combine them manually into a data frame with the data.frame function:
elasticband <- data.frame(strch = c(46,54,48,50,44), dist = c(148,182,173,166,109))

If you want to edit your stuff in a very (very!) basic data editor, the default R install has one:
elasticband <- edit(elasticband)

NOTE: You have to assign the result of the edit back to the object, or you lose all your edits. Bad thing.

(BTW: Just typing the name of the object at the prompt, like elasticband at the prompt will print out your data frame (the dataset you read). This works for almost any object in R: type its name and it just dumps its contents. Handy.)

Besides just typing a name, print(dataframe) will print your dataframe nicely formatted.

To see the names of the variables currently in the dataframe…
names(myDataFrame)

You can also have “row labels” with row.names(myDataFrame). This is somewhat rare; its basically picking one of the variables to be a “row label”. Use if it you need to, but I haven’t found a good use for it except in hacking tables to make output look better.

BTW, here’s a fun one: You can read from the clipboard, handy for quick grabs from Excel: read.table() will read from the clipboard (via file = "clipboard" or readClipboard).

All of this (and more) are in the “official guide to R Input and Output” at http://cran.r-project.org/doc/manuals/R-data.pdf.

The R site has some info on reading in SPSS info here. It basically says “Function read.spss can read files created by the `save’ and `export’ commands in SPSS. It returns a list with one component for each variable in the saved data set. SPSS variables with value labels are optionally converted to R factors.”. This is part of the package FOREIGN. Packages are addins which are listed by library() and loaded by library(foreign) (in this case). I have lots more about packages elsewhere, including R Packages

In practice, it looks like this:

library(foreign); MyDataSet <- read.spss("c:\\junk\\file.sav",to.data.frame=TRUE)

Yes, I did find that I needed double slashes. The data frame is the object MyDataSet.

What really threw me? Dates/Timestamps. SPSS stores date/time values as the number of seconds since October 14, 1582 (the start of the Gregorian calendar) (see http://www.childrensmercy.com/stats/data/dates.asp). So, have to do lots of calcs to convert those dates back to something R can use.

From a post on the R-Help list, here is one way:
library(chron) as.chron(ISOdate(1582, 10, 14) + mydata$SPSSDATE)
as well as this post which points out that spss.get in package Hmisc can handle SPSS dates automatically. This and additional discussion on SPSS dates is available in the Help Desk article in R News 4/1.

After loading, type library(help='Hmisc'), ?Overview, or ?Hmisc.Overview'
to see overall documentation.

There are a few things to consider. Start with
dataset <- spss.get("c:\\junk\\WN User Survey Final Data.sav")
but you may want to add the charfactor=T if you want to convert character variables to factors. Play with it, it may help or hurt. You can always do it later.
dataset <- spss.get("c:\\junk\\WN User Survey Final Data.sav", charfactor=T)

Databases
RODBC handles ODBC connections to databases
channel <- odbcConnect("DSN") odbcGetInfo(channel) # Prints useful info sqlTables(channel) #gets all the table names; is there a way to filter this?

Don’t forget to close or odbcClose at the end.
Function sqlSave copies an R data frame to a table in the database, and sqlFetch copies a table in the database to an R data frame.

An SQL query can be sent to the database by a call to sqlQuery. This returns the result in an R data frame. (sqlCopy sends a query to the database and saves the result as a table in the database.)

data1 <- sqlQuery(channel,"select * from dual")

If you need multiple lines for a long query, use the paste() function to assemble a full query. This can also be used to create substitutions, etc.

A finer level of control is attained by first calling odbcQuery and then sqlGetResults to fetch the results. The latter can be used within a loop to retrieve a limited number of rows
at a time, as can function sqlFetchMore.

And remember, you can read from spreadsheets via ODBC as well… but read only, no write back via ODBC! Note the use of the different connect, odbcConnectExcel.
library(RODBC) channel <- odbcConnectExcel("bdr.xls") ## list the spreadsheets sqlTables(channel) ## Either of the below will read in Sheet 1: sh1 <- sqlFetch(channel, "Sheet1") sh1 <- sqlQuery(channel, "select * from [Sheet1$]")

(Also, there is a DBI approach (similar to PERL DBI) which works kind of similarly. See package DBI:
<pre> library(DBI) library(ROracle) ora=dbDriver("Oracle") </pre>

To run the Windows binary packge ROracle_.zip
you’ll need the client software from Oracle. You must have the
$ORACLE_HOME/bin in your path in order for R to find the Oracle’s runtime libraries. The binary is currently not on CRAN (grrr), but only at http://stat.bell-labs.com/RS-DBI/download/index.html Note that you would do better to compile this yourself, or better yet, skip it and just use RODBC. )

Some more is in this http://cran.r-project.org/doc/manuals/R-data.pdf and is well worth a read.

Loading DataSets
If you saved it with “save” (see below”), then you can use the load() command. For example, load("thatdataset.Rdata")

data(name) loads the data attached to a package (ie, in its search path)
data() lists what’s in the most recently attached package? data() can also be used to load other things; for most purposes, I would stick with load() (and save(), see below).

Use data(package = .packages(all.available = TRUE)) to list the data sets in all available packages; this is handy for seeing just what sample data you have available for testing or demo purposes.

attach(data.frame1) makes the variables in data.frame1 active and available generally… i.e., this loads a previously saved data frame. Its like load(), but only loads when it needs it… kind of an “on deck” command. Yes, this is confusing; sorry bout that.

To load a collection of commands (i.e, a script or command file), try source("commands.R"). sink("record.lis") sends all output to a file; sink() turns this off.

Saving
You can save your entire “workspace” with save.image(file="archive.RData") These can then be attached or loaded later. save.image() (i.e., nothing in parens) is just a short-cut for “save my current environment”, equivalent to save(list = ls(all=TRUE), file = ".RData"). It is what also happens with q("yes").

A more common thing is to just save useful variable objects:
save(celsius, fahrenheit, file=“tempscales.RData”). Note that these are all binary files, so they can move from platform to platform, but are UNREADABLE IN ANY OTHER SOFTWARE. You were warned.

(BTW: R, by default, can save the entire workspace as the hidden file .RData. This can cause confusion later on, so be careful. This whole “saving the workspace” is painful, and in fact, the R folks suggest using a different directory for each “set” of analyses you do so you can store the whole thing in .RData and .Rhistory per directory. This is kludgy.)

WORKING WITH DATA
names(obj1) prints the names, e.g., of a matrix or data frame. (aka variable names)

List of objects: ls()
rm(object1) removes object1. To remove all objects, say rm(list=ls()).
Size of Objects: dim() or (for vectors) length().

Multiple vectors (Columns) get linked into a data frame. THis is pretty central to how R works. Unlike SPSS (up through v14), R can hold multiple datasets in memory at once, each named.

This data frame stuff will drive you up a wall. I’ve mentioned it elsewhere as well, so keep trying. If each variable is a vector or array or list, you can make a “list of lists”. The dataframe is the list of variables; each variable is its own list/array. (Yes, List and Array and Vector are all special terms in R, so I shouldn’t use them interchangeably. Sorry.)

The data.frame() function puts together several vectors into a dataframe, which has rows and columns like a matrix.

Quickest program:
x<-rnorm(1000)
hist(x)

save(x1,file="file1") saves object x1 to file file1. To read in the file, use load("file1").

q() quits the program. q("yes") saves everything.

Formatting
options(digits=3) sets digit printout to 3 decimals

Data Manipulation
Besides lots of formulas, each column (variable) is basically a vector, and so you can do all sorts of vector stuff like concatenate, subset, etc.

Now, remember, everything is an object. So, think in terms of functions on the entire object, and assume you can’t use loops (you can, but they suck). The good news is that this gives lots of interesting possiblities. For example, since x[2] is item 2 of x, you can also do x[x>3] to magically see all the items with values greater than 3.

To expand a vector or whatever, just assign something out of its current range. To truncate, set the length to whatever you want or use the index:
x <- x[1,2,3,5] keeps only the 4 items referenced there.

In addition, like a SQL join, calculations will expand vectors. So, if you multiply a single vector by a vector with 5 items, its like the single is applied across all 5 items. Similarly, the function sapply() takes as arguments the the data frame, and the function that is to be applied and applies it across the columns (like a closure?)

rep(thing, count) replicates thing count times.

Now, a vector with names (levels) gets a special name: factor. Basically, its like the AUTORECODE in spss. Take a column, convert it:
state<-factor(state)
This reduces storage, and some R functions expect a “factor”. Like SPSS, levels (integers) are assigned in alpha sort order of the levels, not in order of appearance. if that’s a problem, assign a (..., ref=“LA”) to force a specific option to be the 0th or reference category. You can even just manually force the (..., levels=c(“LA”, “MA”, etc.)) if you want

Besides Vectors, you can have Arrays/Matrices (2 or more dimensions, all of same type), Data Frames (basically an array with each column as different type), Lists (Vectors with vectors in them, like a nested array), and Strings (basically, character vectors).

Strings prefer double quotes, and use C style escape sequences:

BTW, all the stuff you do with numerics can be done with characters, like repeats; Paste() is a substring combiner”
<pre> labs <- paste(c("X","Y"), 1:10, sep="") becomes c("X1", "Y2", "X3", "Y4", "X5", "Y6", "X7", "Y8", "X9", "Y10") </pre>

Every object in R has a mode() and a length(). Othere attributes are available via the attributes() function.

as.character(), as.integer(), etc. convert objects to new modes.

When you have a dataframe, you can get to its columns a couple of ways:

This whole “access” or “extraction” thing is painful. ?extract gives more details. Basically, you can use [] or $, or [[]]. You can even get help on them help("[["). Here are some more details.

With much help from the “Introduction to R” in the next few paras:
An R list is an object consisting of an ordered collection of objects known as its components. There is no particular need for the components to be of the same mode or type, and, for example, a list could consist of a numeric vector, a logical value, a matrix, a complex vector, a character array, a function, and so on. Here is a simple example of how to make a list:
Lst <- list(name="Fred", wife="Mary", no.children=3, child.ages=c(4,7,9))

Components are always numbered and may always be referred to as such. Thus if Lst is the name of a list with four components, these may be individually referred to as Lst[[1]], Lst[[2]], Lst[[3]] and Lst[[4]]. (If, further, Lst[[4]] is a vector subscripted array (as it is in the example) then Lst[[4]][1] is its first entry.)

If Lst is a list, then the function length(Lst) gives the number of (top level) components it has.

List components can have names; you can see how they wer hand typed in the example. So, name$component_name is another way to get to the data. Lst$name is the same as Lst[[1]] and is the string “Fred”.

Additionally, one can also use the names of the list components in double square brackets, i.e., Lst[[“name”]] is the same as Lst$name. This is especially useful, when the name of the component to be extracted is stored in another variable as in
x <- "name"; Lst[[x]]

It is very important to distinguish Lst[[1]] from Lst[1]. `[[...]]’ is the operator used to select a single element, whereas `[...]’ is a general subscripting operator. Thus the former is the first object in the list Lst, and if it is a named list the name is not included. The latter is a sublist of the list Lst consisting of the first entry only. If it is a named list, the names are transferred to the sublist.

A data frame is basically just a list with class “data.frame”.

If you have a dataframe and you don’t want to keep typing “dataframe$variable”, you can attach(dataframename) and just use the variable names, and then detach(dataframename) when you are done. If you are just using one dataframe, this is pretty handy.

Remember, its dataframe[ROWS,COLUMNS]. So if you want all rows for column 3, try mydataframe[,3]. If you leave out the comma, R assumes you meant the column, so mydataframe3 is the same, for the most part, as mydataframe[,3]. Names are more annoying. If the columns have names, you can do one with mydataframe[“q1”], but if you want more than one name, you have to use the c() function! mydataframe[c(“q1”,“q3”,“q9”)].

Now, if you like the $ approach, that’s fine… but if you want to select multiple variables, you have to recreate the dataframe, so in effect: summary(data.frame(mydata$q1, mydata$q3)). Yes, this is annoying.

Ok, all that mishagas aside, the subset() seems to be the winner:
subset(dataset,(dataset$Q4>=18 & is.na(dataset$Q8)==F))

=========

BTW... R loves to convert character data into factors AUTOMATICALLY when you create the dataframe to save memory. This can be REALLY ANNOYING. If you don’t want this, consider:
data.frame(v1, I(v2)) The I() function says “interpret this as raw, no transform” , basically. More at ?data.frame and ?read.table

Deleting a column: 4 examples:

  data(iris)
   iris[,5] <- NULL  

data(iris) iris$Species <- NULL data(iris) iris[,“Species”] <- NULL or Newdata <- subset(d2004, select=-c(concentration,stade)) or mydf2 <- as.data.frame(matrix(runif(100),ncol=20)) ### if you want to erase the third column, do: mydf <- mydf[,-3] ### if you want to erase the first, third and twentieth column, do: mydf2 <- mydf2[,-c(1,5,20)]

see http://cran.r-project.org/doc/contrib/usingR-2.pdf page 22 for useful functions

Checking for Nulls: NA is the R “null”, and you aren’t supposed to test for equality to it. Instead, use is.na(x)

Getting Uniques: df[!duplicated(df$colname),] You could also use aggregate but aggregate() is basically a wrapper for tapply and tapply basically loops in R. duplicated() loops in C (and uses hashing).

How to count distinct or count unique in a column? well, unique() returns the uniques in a vector. sort(unique()) returns the sorted uniques.

And length() tells number of items in the vector… but gives number of columns in a Dataframe (which is a list of lists, each internal list being a variable vector. Annoying, huh?. dim() does this for a dataframe. So, if you are looking to count a variable, use length, but be careful, ok?

Unique Lists from Lists: https://stat.ethz.ch/pipermail/r-help/2004-September/056830.html

cbind, sapply, tapply, mapply, split become your best friends.

GRAPHICS
plot() does x-y plots
pairs() does great pairwise x-y plots… very handy.

——————————————

help.start() starts the browser version of the help files, or just help().
help(command1) prints the help available about command1. help.search("keyword1") searches keywords for help on this topic. apropos(topic1) or apropos("topic1") finds commands relevant to topic1, whatever it is. example(command1) prints an example of the use of the command. This is especially useful for graphics commands. Try, for example, example(contour), example(dotchart), example(image), and example(persp).

——————————————————-

If you are a newsgroup kinda person, try
http://gmane.org/info.php?group=gmane.comp.lang.r.general

Tables Tips:
https://stat.ethz.ch/pipermail/r-help/2004-September/055438.html

Cat vs. print vs. format:
If one wants to display a character string with control over
newlines then one typically uses cat. If one wants to display
an object one uses print or else converts it to a character string
using format or as.character and then display it using cat.

Linking to Excel?
https://stat.ethz.ch/pipermail/r-help/2004-September/055724.html

Big Data and Memory
Getting all your memory:
Rgui.exe —max-mem-size=2Gb or —max-mem-size=2000M,

But I reprint (edited) from the R Faq for Windows

2.9 There seems to be a limit on the memory it uses!

Indeed there is. It is set by the command-line flag —max-mem-size and defaults to the smaller of the amount of physical RAM in the machine and 1.5Gb. It can be set to any amount between 32Mb and 3Gb. Be aware though that Windows has (in most versions) a maximum amount of user virtual memory of 2Gb. Use ?Memory and ?memory.size for information about memory usage. The limit can be raised by calling memory.limit within a running R session. The executables Rgui.exe and Rterm.exe support up to 3Gb per process under suitably enabled versions of Windows (see http://www.microsoft.com/whdc/system/platform/server/PAE/PAEmem.mspx: even where this is supported it has to be specifically enabled). On such systems, the default for —max-mem-size is the smaller of the amount of RAM and 2.5Gb.

From the R-Help Mailing List:
Duncan Murdoch, Friday, March 03, 2006 says:

R can deal with big data sets, just not nearly as conveniently as it deals with ones that fit in memory. The most straightforward way is probably to put them in a database, and use RODBC or one of the database-specific packages to read the data in blocks. (You could also leave the data in a flat file and read it a block at a time from there, but the database is probably worth the trouble: other people have done the work involved in sorting, selecting, etc.) The main problem you’ll run into is that almost none of the R functions know about databases, so you’ll end up doing a lot of work to rewrite the algorithms to work one block at a time, or on a random sample of data, or whatever.

From a post at DecisionStats :
A very rough rule of thumb has been that the 2-3GB limit of the common 32bit processors can handle a dataset of up to about 50,000 rows with 100 columns (or 100,000 rows and 10 columns, etc), depending on the algorithms you deploy.

=========================

From the R-Help Mailing List:
[R] Re: suggestion on data mining book using R
Vito Ricci on Thu, 20 Jan 2005
Hi, see these links:
http://www.liacc.up.pt/~ltorgo/DataMiningWithR/
http://sawww.epfl.ch/SIC/SA/publications/FI01/fi-sp-1/sp-1-page45.html

Brian D. Ripley, Datamining: Large Databases and
Methods, in Proceedings of “useR! 2004 – The R User
Conference”, may 2004
http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Ripley.pdf

and if looking for a book I (Vito Ricci) suggest:

Trevor Hastie , Robert Tibshirani, Jerome Friedman,
The Elements of Statistical Learning: Data Mining,
Inference, and Prediction, 2001, Springer-Verlag.
http://www-stat.stanford.edu/~tibs/ElemStatLearn/

B.D. Ripley, Pattern Recognition and Neural Networks
http://www.stats.ox.ac.uk/~ripley/PRbook/

==========================

Some other docu links to check out:
http://cran.r-project.org/doc/contrib/Quene.pdf
http://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf
http://cran.r-project.org/doc/contrib/Alzola+Harrell-Hmisc-Design-Intro.pdf
http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf
http://cran.r-project.org/doc/contrib/Rossiter-RIntro-ITC.pdf (Nice, includes some good tips)
http://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdf
http://cran.r-project.org/doc/contrib/Vikneswaran-ED_companion.pdf (An R Companion to Experimental Design)
http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf (R for Beginners)
http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf (Fitting Distributions with R)
http://cran.r-project.org/doc/contrib/Burns-unwilling_S.pdf (The unwilling S user; remember that R is the open source S, so most things will work in both systems).
http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf (Practical Regression and Anova using R)
http://zoonek2.free.fr/UNIX/48_R/all.html

Some tips on reading and manipulating data are in this PDF.

http://www-128.ibm.com/developerworks/linux/library/l-r1/

http://cran.r-project.org/doc/contrib/Kuhnert+Venables-R_Course_Notes.zip has lots of stuff in it, including data.
http://cran.r-project.org/doc/contrib/Marthews-BeginnersRcourse.zip has a 9 page rapid fire doc to get you started, includes data.
http://cran.r-project.org/doc/contrib/Lemon-kickstart_1.6.zip has a collection of HTML docs, so a pain to use, but once unzipped, is a good dive in… Already unzipped at http://cran.r-project.org/doc/contrib/Lemon-kickstart/ if you want to see it.

http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8033.html and as.date, strptime, and chron

Sample dataset all about the Titanic:
loadUrl(‘http://biostat.mc.vanderbilt.edu/twiki/pub/Main/DataSets/titanic3.sav’)

Hoping I helped you.

* * *

 

  1. Excellent stuff! Very good and informative.

    Cheers,

    TOM


    Tom Dierickx    Jan 25, 09:50 PM    #


  2. Heh, I just got some on-the-job training in R, and look what I find online! You’ll be happy to know that with the influx of folks who know R at my current company, we might actually be able to tear the FORTRAN manuals out of the hands of some of the old-school psychometricians – and not a moment too soon!


    Kimberly    Oct 18, 10:21 PM    #


  3. Hi,
    This is my first day in R.
    I want to load a big dataset (1GB). I keep receiving error messages:

    Error: cannot allocate vector of size 3.1 Mb
    In addition: Warning messages:
    1: In rval[[v]][rval[[v]] >= stata.na$min[this.type]] = stata.na$min[this.type]] = stata.na$min[this.type]] = stata.na$min[this.type]] Rgui.exe —max-mem-size=2Gb
    Error: unexpected input in “Rgui.exe —”
    > —max-mem-size=2000M
    Error: unexpected input in “—”

    Please help!


    Amadou DIALLO    Oct 27, 06:54 PM    #


  4. Don’t forget the R video tutorial


    Dan    Aug 20, 05:20 PM    #


Name
E-mail
http://
Message
  Textile Help
Please note that your email will be obfuscated via entities, so its ok to put a real one if you feel like it...

Admin
powered by Textpattern 4.0.4 (r1956)