Focusing on R Essentials

Randy Pruim
2013-May-18

Less Volume, More Creativity

Less Volume, More Creativity

A lot of times you end up putting in a lot more volume, because you are teaching fundamentals and you are teaching concepts that you need to put in, but you may not necessarily use because they are building blocks for other concepts and variations that will come off of that … In the offseason you have a chance to take a step back and tailor it more specifically towards your team and towards your players.“

Mike McCarthy, Head Coach, Green Bay Packers

SIBKIS

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.

— Antoine de Saint-Exupery

The Most Important R Template

 

goal ( yyy ~ xxx , data = mydata )

 

The Most Important R Template

 

goal (  y  ~  x  , data = mydata )

The Most Important R Template

 

goal (  y  ~  x  , data = mydata , …)

 

Simpler version:

  • goal( ~ x, data = mydata )

 

Fancier version:

  • goal( y ~ x | z , data = mydata )

 

Unified version:

  • goal( formula , data = mydata )

The Two Questions

 

goal (  y  ~  x  , data = mydata )

 

What do you want R to do? (goal)

  • This determines the function to use

 

What must R know to do that?

  • This determines the inputs to the function
  • Must identify the variables and data frame

Graphical Summaries: One Variable

freqpolygon( ~age, data=HELPrct) 

plot of chunk unnamed-chunk-3

What is a Frequency Polygon?

plot of chunk unnamed-chunk-4

Graphical Summaries: One Variable

freqpolygon( ~age, data=HELPrct ) 
  histogram( ~age, data=HELPrct ) 
densityplot( ~age, data=HELPrct ) 
     bwplot( ~age, data=HELPrct ) 
     qqmath( ~age, data=HELPrct ) 

Graphical Summaries: Two Variables

xyplot( births ~ dayofyear, data=Births78) 

plot of chunk unnamed-chunk-6

Graphical Summaries

bwplot( age ~ substance, data=HELPrct) 

plot of chunk unnamed-chunk-7

Graphical Summaries

bwplot( substance ~ age, data=HELPrct) 

plot of chunk unnamed-chunk-8

The Graphics Template

plotname (  y  ~  x  , data = mydata , …)

 

One variable

  • histogram(), qqmath(), densityplot(), freqpolygon()

Two Variables

  • xyplot(), bwplot()

Your turn

Create a plot of your own choosing.

Hints:

names(HELPrct)
  • i1 average number of drinks (standard units) consumed per day, in the past 30 days (measured at baseline)

  • i2 maximum number of drinks (standard units) consumed per day, in the past 30 days (measured at baseline)

names(Utilities2)

plotname (  y  ~  x  , data = mydata , …)

groups and panels

  • Add groups = ??? to overlay plots on top of each other.
  • Use y ~ x | z to create multipanel plots.
densityplot( ~ age | sex, data=HELPrct,  
               group=substance,  
               auto.key=TRUE)   

plot of chunk unnamed-chunk-11

Numerical Summaries: One Variable

mean( ~ age, data=HELPrct )
[1] 35.65
favstats( ~ age, data=HELPrct )
 min Q1 median Q3 max  mean   sd   n missing
  19 30     35 40  60 35.65 7.71 453       0
tally( ~ sex, data=HELPrct)

female   male  Total 
   107    346    453 

Numerical Summaries: Two Variables

sd( age ~ substance, data=HELPrct )
alcohol cocaine  heroin 
  7.652   6.693   7.986 

Numerical Summaries: Tables

tally( sex ~ substance, data=HELPrct )
        substance
sex      alcohol cocaine heroin
  female  0.2034  0.2697 0.2419
  male    0.7966  0.7303 0.7581
  Total   1.0000  1.0000 1.0000
tally( ~ sex + substance, data=HELPrct )
        substance
sex      alcohol cocaine heroin Total
  female      36      41     30   107
  male       141     111     94   346
  Total      177     152    124   453

Your turn

Create a numerical summary of your own choosing.

Hints:

names(HELPrct)
  • i1 average number of drinks (standard units) consumed per day, in the past 30 days (measured at baseline)

  • i2 maximum number of drinks (standard units) consumed per day, in the past 30 days (measured at baseline)

names(Utilities2)

summary (  y  ~  x  , data = mydata , …)

  • possible summaries: mean(), median(), min(), max(), sd(), var(), favstats(), etc.

Linear Models

Linear models (regression, ANOVA, etc.) follow the same template:

lm ( formula, data=mydata )