Most of your plots in ggplot2
will start from the following outline
MyData %>%
ggplot() +
geom_()
We need to
So our minimal boxplot is something like this.
HELPrct %>%
ggplot() +
geom_boxplot(aes(x = homeless, y = drugrisk))
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).
We’d rather have these go horizontally. We do that by flipping the coordinates. (You can’t change the roles of x
and y
in the boxplot geom.) While we are at it, let’s get rid of that message about missing data.
HELPrct %>%
ggplot() +
geom_boxplot(aes(x = homeless, y = drugrisk), na.rm = TRUE) +
coord_flip()
This data is pretty highly skewd, so it is hard to get much information from the lower end of the boxplot and there are a lot of dots on the upper end. Let’s change to a violin plot.
HELPrct %>%
ggplot() +
geom_violin(aes(x = homeless, y = drugrisk), na.rm = TRUE) +
coord_flip()
We might like to look at color and sex as well. Let’s use color for one and faceting for the other.
HELPrct %>%
ggplot(aes(x=homeless, y = drugrisk, colour = sex)) +
geom_violin(na.rm = TRUE) +
coord_flip() +
facet_grid( . ~ substance )
HELPrct %>%
ggplot(aes(x=homeless, y = drugrisk, colour = sex)) +
geom_violin(na.rm = TRUE) +
geom_point(na.rm = TRUE) +
coord_flip() +
facet_grid( . ~ substance )
Hmm. Those points are not really where we want them to be. Jittering adds some random noise. Dodging moves entire groups left and right or up and down.
HELPrct %>%
ggplot(aes(x=homeless, y = drugrisk, colour = sex)) +
geom_violin(na.rm = TRUE) +
geom_point(position = position_jitterdodge(jitter.width = 0.2, jitter.height = 0.0, dodge.width = 0.9),
na.rm = TRUE) +
coord_flip() +
facet_grid( . ~ substance )
That’s a lot of pieces, but we built it up a step and a time, and each step isn’t that bad.
If we want to customize the particular choices of colors, sizes, shapes, etc. we do that by adjusting the scales for those aesthetics.
HELPrct %>%
ggplot(aes(x = age, y = e2b, colour = sex, shape = homeless, size = homeless)) +
# Identify the outlier by ID. Put this first so the dot is on top of the label.
geom_text(data = HELPrct %>% filter(e2b > 20), aes(label = id), size = 10, color = "yellow", hjust = 0.5, vjust = 0.5) +
geom_point(na.rm = TRUE, aes(alpha = substance)) +
scale_color_manual(values = c("red", "navy")) + # I want red and navy for the colors
scale_alpha_manual( values = c(.9, 0.1, 0.1)) + # alcohol dark, other two light
theme_minimal()
## Warning: Using size for a discrete variable is not advised.
require(NHANES)
## Loading required package: NHANES
NHANES %>%
ggplot(aes(x = Pulse)) +
geom_area(stat = "density", colour = "navy", fill = "thistle", na.rm = TRUE) +
geom_vline(xintercept = 85) +
theme_minimal()
It would be nice if the line segment stopped at the density curve. This is hard to do programatically because the height value isn’t in our data. The curve data are computed in the stat – and there may not be a value even there for exactly 85.
We can do this manually for a given plot, but that won’t adjsut well if the data (or even plot configuration) change.
require(NHANES)
NHANES %>%
ggplot(aes(x = Pulse)) +
geom_area(stat = "density", colour = "navy", fill = "thistle", na.rm = TRUE) +
geom_segment(aes(y = 0, yend = .019, x = 85, xend = 85), alpha = 0.01) +
theme_minimal()
Here’s another, perhaps not completely successful, way to separate the left and right sides of the plot.
require(NHANES)
NHANES %>%
ggplot(aes(x = Pulse)) +
geom_rect(aes(ymin = 0, ymax = .04, xmin = 85, xmax = 140), alpha = 0.01, fill = "pink") +
geom_area(stat = "density", colour = "navy", fill = "thistle", na.rm = TRUE, alpha = 0.5) +
theme_minimal()
We can actually get what Rachelle really wanted if we do things a different way. Our problem is that we don’t have access to the data used to make the density curve because it is created in the stat and hidden away deep withing the bowls of ggplot.
But if we make that data ourselves external to the plotting, we can render it however we like. The density()
function creates a list that stores x-y coordinates for the curve in its first two elements (and some additional information we don’t really need in other slots.) density()
requires us to access the pulse data in a slightly different way, however.
densityDF <-
data.frame(density(NHANES$Pulse, na.rm = TRUE)[1:2])
head(densityDF) # so you can see what the result looks like
## x y
## 1 34.63452 2.707105e-06
## 2 34.84339 3.790503e-06
## 3 35.05225 5.252754e-06
## 4 35.26112 7.242036e-06
## 5 35.46999 9.832116e-06
## 6 35.67885 1.314920e-05
With this new data frame in hand, we can create the plot we want easily
densityDF %>%
ggplot() +
geom_area(aes(x = x, y = y, fill = x > 85), alpha = 0.4, colour = "navy") +
scale_fill_manual(values = c("thistle", "navy"))
One could argue that the legend is unnecessary in this example, so let’s turn off the guide for fill.
densityDF %>%
ggplot() +
geom_area(aes(x = x, y = y, fill = x > 85), alpha = 0.4, colour = "navy") +
scale_fill_manual(values = c("thistle", "navy"), guide = FALSE)
Here’s another way to do that.
densityDF %>%
ggplot() +
geom_area(aes(x = x, y = y, fill = x > 85), alpha = 0.4, colour = "navy") +
scale_fill_manual(values = c("thistle", "navy")) +
guides(fill = "none")