R Programming: Parts of Speech

Command chains

Your commands will be written as chains.

  • Each link in the chain will be a data verb and its arguments.
    • The very first link is usually a data frame.
  • Links are connected by the chaining symbol %>%

  • Often, but not always, you will save the output of the chain in a named object.
    • This is done with the assignment operator, <-

      Name_of_result <-     
        Starting_data_frame  %>%    
        first verb (arguments for details) %>%   
        next verb (and its arguments) %>%    
         ... and so on, up through ...   
        last verb (and its arguments)  

An example command chain

Princes <- 
  BabyNames %>%
  filter(grepl("Prince", name)) %>%
  group_by(year) %>%
  summarise(total = sum(count))
  • A good idea to put each link on its own line
  • Note that %>% is at the end of each line.
    • Except … Princes <- is assignment
    • Except … The last line has no %>%.

Syntax and semantics

There are two distinct aspects involved in reading or writing a command chain.

  1. Syntax: the grammar of the command
  2. Semantics: the meaning of the command

The focus today is on syntax.

Part of Speech

From the dictionarty

part of speech noun

parts of speech a category to which a word is assigned in accordance with its syntactic functions. In English the main parts of speech are noun, pronoun, adjective, determiner, verb, adverb, preposition, conjunction, and interjection.

Parts of Speech in R

  1. Data frames
  2. Functions
  3. Arguments
  4. Variables
  5. Constants
  6. Assignment
  7. Formulas (we won’t use these until the end)

Data frames

  • A data frame comprises one or more variables.
  • Naming convention: data frames are given names that start with a CAPITAL LETTER, e.g., RegisteredVoters.
  • A data frame will always be the input at the start of a command chain.
  • If assignment is used to save the result, the object created is usually a data frame.

Functions

  • Functions are objects that transform an input into an output.
  • Functions are always followed by parentheses, that is, an opening ( and, eventually, a closing ).
  • Each link in a command chain starts with a function.
    • More specifically, the function is a data verb that takes a data frame as input and produces another data frame as output.
    • There are other kinds of functions, e.g. summary (or reduction) functions and transformation functions.

Arguments

The things that go inside a function’s parentheses are called arguments.

  • Arguments describe the details of what a function is to do.
  • If there are multiple arguments, they are always separated by commas.
  • Many functions take named arguments which look like a name followed by an = sign, e.g.

    summarise(total = sum(count))

You can also consider the data frame passed along by %>% as an argument to the following function.

Variables

Variables are the components of data frames.

  • When they are used, they always appear in function arguments, that is, between the function’s parentheses.
  • A good convention is for variables to have names that start with a lower-case letter. The convention is not universally followed.
  • Variables will never be followed by (.

Constants

Constants are single values, most commonly a number or a character string.

  • Character strings will always be in quotation marks,
    "like this."
  • Numerals are the written form of numbers, for instance.
    -42 1984 3.14159

Discussion Problem

Consider this command chain:

Princes <- 
  BabyNames %>%
  filter(grepl("Prince", name)) %>%
  group_by(year) %>%
  summarise(total = sum(count))

Just from the syntax, you should be able to tell which of the five different kinds of object each of these things is: Princes, BabyNames, filter, grepl, "Prince", name, group_by, year, summarise, total, sum, count.

Explain your reasoning.