Test file

Basics of Data Science in R & RStudio

  USCOTS 2015

  Workshop leaders: Daniel Kaplan & Nick Horton

Workshop Times

  • Wednesday May 27: 8:30am - 4:00pm Morning and afternoon sessions with a break for lunch (participants are on their own for lunch).
  • Thursday May 28: 8:30am - 12:00pm Morning session only

Prior to the Wednesday start.

Come a bit before the official 8:30 am start to get help with these.

  • Making sure you have access to the software.
    • R 3.1.1 or more recent as well as RStudio preview edition
      or, alternatively,
    • use a web browser to access the RStudio server being provided for this workshop at http://macalester.edu/rstudio. You will need login credentials which will be provided at the start of the workshop. (N.B., this server will be available only for the duration of USCOTS 2015. We will show you how to preserve your files for later use on another system.)

    Whichever way you access R and RStudio, please

    • update and install additional packages, by executing these lines in the R console in RStudio:
      update.packages()
      install.packages(c("devtools", "mosaic", "tidyr", "dplyr","ggvis", "rmarkdown", "shiny", "haven", "mosaicData", "manipulate", "babynames", "nycflights13", "wordcloud", "tm", "lubridate"))
      devtools::install_github("ProjectMOSAIC/mosaic", ref="beta")
      devtools::install_github("dtkaplan/DCF")
      devtools::install_github("dtkaplan/DCFinteractive")
  • Open up communal notes for workshop. Link to Google Doc Add comments and questions as the workshop progresses.
  • Participant skill survey (for forming teams). Link to survey. After you press "submit", you'll see a page with a link to "Edit your response." Open that in your browser and leave it open so that you can fill in your team name when you have formed it.

Workshop outline

Welcome!

  1. Goals, topics, set-up
  2. Introduce your neighbors, and form teams for workshop activities. Follow the instructions, here. Two to three people per team, with a nice mix of skills. Give your team a name so that we can refer to it. [15 min]

RStudio Quick Start

  • mosaic resources
  • panes, tabs, console
  • Lead through opening a project. open a project
  • loading packages, install if necessary
  • bringing in data from a package
  • Data frames. We'll work with ones from mosaicData, DCF, and nycflights13 to begin with. Variables and cases. Quick summaries: glimpse(), nrow(), names(), head()
    require(mosaic)
    require(nycflights13)
    glimpse(flights)
    names(airplanes)
    names(airlines)
    names(airports)

Visualization Quick Start

R Markdown Quick Start

  1. Quick intro to R Markdown and reproducible analysis Prezi [NJH, 10 min]
  2. Activity (visualization redux): Give these commands to open our template file in the editor.
    download.file(
      "http://dtkaplan.github.io/USCOTS-2015/Handouts/USCOTS-01.Rmd",
      dest="USCOTS-01.Rmd")
    file.edit("USCOTS-01.Rmd")
  3. Add a command to generate a visualization (using the output of Show Expression above).
  4. Compile it to HTML.
  5. Publish it. RPubs ID: USCOTS-2015 Password: Written on board.
    • Cut and paste the corresponding ggplot2 command from Show Expression output in mplot() as a chunk into your Rmd file.
    • Publish to RPubs (user ID: USCOTS-2015)
  6. Data visualization cheat sheet as handout, or follow the link.

Wrangling Quick Start

  1. Notes
  2. Practice with data verbs and dplyr: An interactive app
    • Find at least one wrangling operation and setting that is interesting. Describe it here.
  3. Activity: What happened to Mary and Jane? Download and edit the template:
    download.file(
      "http://dtkaplan.github.io/USCOTS-2015/Handouts/WhatHappenedToJane/WhatHappenedToJane.Rmd",
      dest="WhatHappenedToJane.Rmd")
    file.edit("WhatHappenedToJane.Rmd")
    The answers are here
  4. Data Wrangling Cheat Sheet

Mini case study: Bike sharing

Simple to moderately complicated data wrangling tasks in the context of bike rentals in thw Washington, DC area. For quick browsing, here's the HTML file. But work with the .Rmd as shown.

  1. Download and edit the .Rmd template file. Copy and paste these commands into your R console.
    download.file(
      "http://dtkaplan.github.io/USCOTS-2015/Handouts/BikeSharing/BikeSharingBasics.Rmd", 
      dest="BikeSharingBasics.Rmd")
    file.edit("BikeSharingBasics.Rmd")
  2. Make sure to compile the unchanged file to HTML. It's best to start with a working document then add small changes, recompile, fix errors as required, add more small changes, recompile, and so on.
  3. Links to answers: html & Rmd

Macro case study: Airline flights database

More practice with dplyr (plus joins) using the nycflights13 dataset [NJH] (20 minutes)

download.file(
  "http://dtkaplan.github.io/USCOTS-2015/Handouts/USCOTS-flights.Rmd",
  dest="USCOTS-flights.Rmd")
file.edit("USCOTS-flights.Rmd")

Shiny and dynamic graphics

  • brief introduction to Shiny via Markdown [NJH] (20 minutes) plus SAT example
  • Activity: customize a default shiny file. [DTK, 30 min]
    download.file("http://dtkaplan.github.io/USCOTS-2015/Handouts/USCOTS-Shiny.Rmd",
          dest="USCOTS-Shiny.Rmd")
    file.edit("USCOTS-Shiny.Rmd")
  • Interacting directly with a graphic. Example: Mapping the bicycle traffic
    download.file(
      "http://dtkaplan.github.io/USCOTS-2015/Handouts/InteractiveMap.Rmd", 
      dest="InteractiveMap.Rmd")
    file.edit("InteractiveMap.Rmd")

Data Scraping Quick Start

Framing Statistical Questions with Data Science

Discussion:

  1. Your ideas for how you might use this material in your own teaching.
  2. What support do you need moving forward?

Examples:

Topics we didn't have the time to get to.

Models and Machine Learning: for self study

  • glyph-ready is also model-ready [DTK]
  • lm() & glm(): are they machine learning?
  • “machine learning”
  • unsupervised: car data clustering
  • supervised: CART

Text mining 101 [NJH]

download.file(
  "http://dtkaplan.github.io/USCOTS-2015/Handouts/USCOTS-text.Rmd",
  dest="USCOTS-text.Rmd")
file.edit("USCOTS-text.Rmd")