Graphics Expressions

The functions mScatter(), mBar(), etc. can provide a convenient way to map variables to graphical attributes. They were written to make it easy and quickly to explore different possibilities for presenting variables graphically. Behind the scenes, though, mScatter(), mBar(), etc. construct R-language expressions for defining the graphic. For many purposes, it’s better to write those expressions directly. Some examples of such purposes

to generate and document a data graphic using R/Markdown.
to add glyph layers to a graphic.
to refine labels, colors, etc.

Reading GGPlot Commands

The graphing software we use in DCF is a package called ggplot2. This same software is used in professional work.¹ As with other software, an important first step is learning to read and interpret the commands. Once you can read, you’ll find it easy to copy and modify existing commands to customize them to your purpose.

When using interfaces such as mScatter(), by pressing the “Show Expression” button, you can look at the resulting ggplot2 command. For instance, consider this graphic showing a subset of the NHANES data:

plot of chunk unnamed-chunk-2

The graphic was originally generated using these expressions:

Small <- sample_n( NHANES, 2000 )
mScatter( Small )

Pressing the “Show Expression” button on the mScatter() menu² reveals that the underlying ggplot2 expression is:

ggplot( data=Small, aes(x=age,y=height)) + 
  geom_point() + 
  aes(colour=sex)

plot of chunk unnamed-chunk-4

To read and understand this command, consider this explanation:

The ggplot() function signals that a new graphic is being created (as opposed to adding on to an existing graphic).
The + symbol is used to “add” a new component to a graphic.
Glyphs are created by functions called geoms. Here, the geom for dots is being used.
The argument, data=, sets the data table that will be used in the graphic.
The aes() function — based on the word “aesthetic” — specifies the mapping from variables to graphical attributes.

The above statement can be translated into English thus:

ggplot( data=Small, aes(x=age,y=height)): “Start a new plot based on the data in Small. In that plot, the x-coordinate will represent the variable age, while the y-coordinate will be height.”
geom_point(): “Use dots as the glyph …”
aes( colour=sex ): “… and, come to think of it, for any of the glyphs in the graph, sex should be used to set the color.”

The different phrases in the R expression are connected with the + sign, meaning “do this and do that.” If you are putting the phrases on different lines, the + must always be on the preceeding line. Doing so tells R that the expression is not yet complete. Of course, the final phrase should not be followed by + because, at the end of that phrase, the entire expression is complete.

Once you can read the expressions, you can figure out how to modify them to produce the graphic you want. For instance, you can substitute other variables for any or all of age, height, and sex.

Some Basic Graphics

You don’t need to use functions like mScatter() at all. Here are a few templates for different kinds of graphs. Often, once you have chosen the kind of graph you want to make, modifications are as simple as changing the name of the data table and the variables.

Layers

On occasion, data from more than one data table are graphed together. For instance, suppose you want a display of one state’s hospital providers’ charges for different Direct Recovery Groups. Such a display might look like this:

plot of chunk unnamed-chunk-6

This chart uses bars to give a fair impression of the range in charges for different medical procedures in New Jersey.

But how do these charges compare to those in other states? One way to display this is to add another layer showing the individual states.

plot of chunk unnamed-chunk-7

With the context of the individual states, it’s easy to see the charges in New Jersey are among the highest, and often the very highest, in the country for each DRG.

Facets

When you want to compare patterns in other variables across different levels of a variable, facets can be a simple, effective approach. A couple of examples will suffice. (Note: Make sure to notice the tilde ~ in the argument to the facet_wrap() command.)

Small <- sample_n( NHANES, size=3000 )
Small %>% ggplot( aes( x=age, y=height )) +
  geom_point( aes( color=sex )) +
  facet_wrap( ~ death ) # Note the tilde ~

plot of chunk unnamed-chunk-8

By default, facet_wrap() will maintain the same x- and y-scales for every facet. This is a good practice in general; it makes the different levels of the faceting variable easy to compare. But sometimes you may want one or both axis scales to arrange themselves according to the data in just that facet. For this, use as an argument to facet_wrap() one of these choice: scale="free" or scale="free_x" or scale="free_y".

Indeed, when you feel proficient with ggplot2, it’s worthwhile to include that on your résumé.↩
This is the menu that appears when you give the expression mScatter( Small ). If it’s not immediately visible, press the small “gear” icon, , in the plot.↩