The functions mScatter()
, mBar()
, etc. can provide a convenient way to map variables to graphical attributes. They were written to make it easy and quickly to explore different possibilities for presenting variables graphically. Behind the scenes, though, mScatter()
, mBar()
, etc. construct R-language expressions for defining the graphic. For many purposes, it’s better to write those expressions directly. Some examples of such purposes
The graphing software we use in DCF is a package called ggplot2
. This same software is used in professional work.1 As with other software, an important first step is learning to read and interpret the commands. Once you can read, you’ll find it easy to copy and modify existing commands to customize them to your purpose.
When using interfaces such as mScatter()
, by pressing the “Show Expression” button, you can look at the resulting ggplot2
command. For instance, consider this graphic showing a subset of the NHANES
data:
The graphic was originally generated using these expressions:
Small <- sample_n( NHANES, 2000 )
mScatter( Small )
Pressing the “Show Expression” button on the mScatter()
menu2 reveals that the underlying ggplot2
expression is:
ggplot( data=Small, aes(x=age,y=height)) +
geom_point() +
aes(colour=sex)
To read and understand this command, consider this explanation:
ggplot()
function signals that a new graphic is being created (as opposed to adding on to an existing graphic).+
symbol is used to “add” a new component to a graphic.data=
, sets the data table that will be used in the graphic.aes()
function — based on the word “aesthetic” — specifies the mapping from variables to graphical attributes.The above statement can be translated into English thus:
ggplot( data=Small, aes(x=age,y=height))
: “Start a new plot based on the data in Small
. In that plot, the x-coordinate will represent the variable age
, while the y-coordinate will be height
.”geom_point()
: “Use dots as the glyph …”aes( colour=sex )
: “… and, come to think of it, for any of the glyphs in the graph, sex should be used to set the color.”The different phrases in the R expression are connected with the +
sign, meaning “do this and do that.” If you are putting the phrases on different lines, the +
must always be on the preceeding line. Doing so tells R that the expression is not yet complete. Of course, the final phrase should not be followed by +
because, at the end of that phrase, the entire expression is complete.
Once you can read the expressions, you can figure out how to modify them to produce the graphic you want. For instance, you can substitute other variables for any or all of age
, height
, and sex
.
You don’t need to use functions like mScatter()
at all. Here are a few templates for different kinds of graphs. Often, once you have chosen the kind of graph you want to make, modifications are as simple as changing the name of the data table and the variables.
On occasion, data from more than one data table are graphed together. For instance, suppose you want a display of one state’s hospital providers’ charges for different Direct Recovery Groups. Such a display might look like this:
This chart uses bars to give a fair impression of the range in charges for different medical procedures in New Jersey.
But how do these charges compare to those in other states? One way to display this is to add another layer showing the individual states.
With the context of the individual states, it’s easy to see the charges in New Jersey are among the highest, and often the very highest, in the country for each DRG.
When you want to compare patterns in other variables across different levels of a variable, facets can be a simple, effective approach. A couple of examples will suffice. (Note: Make sure to notice the tilde ~ in the argument to the facet_wrap()
command.)
Small <- sample_n( NHANES, size=3000 )
Small %>% ggplot( aes( x=age, y=height )) +
geom_point( aes( color=sex )) +
facet_wrap( ~ death ) # Note the tilde ~
By default, facet_wrap()
will maintain the same x- and y-scales for every facet. This is a good practice in general; it makes the different levels of the faceting variable easy to compare. But sometimes you may want one or both axis scales to arrange themselves according to the data in just that facet. For this, use as an argument to facet_wrap()
one of these choice: scale="free"
or scale="free_x"
or scale="free_y"
.