# Chapter 6 Frames, glyphs, and other components of graphics

Data graphics are built from parts. Chapter 5 showed the parts assembled together. This chapter looks at the parts individually.

Of course, a data frame provides the basis for drawing a data graphic. The relationship between a data frame and a graphic is simple: Each case in the data frame becomes a mark in the graph. The designer of the graphic — you — chooses which variables the graphic will display and how each variable is to be represented graphically: position, size, color, and so on. The marks themselves are called glyphs. A data graphic has one glyph for each case in the data frame.

Key Graphics Vocabulary

frame: The relationship between position and the data being plotted.

glyph: The basic graphical “unit” that represents one case. Other terms used include “mark” and “symbol.” Variables set graphical attributes of the shape: size, color, shape, and so on. The location of the glyph — location is an important graphical attribute! — is set by the two variables defining the frame.

aesthetic: Any graphical attribute of a glyph: size, location, shape, color, etc.

scale: The relationship between the value of a variable and the graphical attribute to be displayed for that value.

guide: An indication of the scale for a human viewer in order to show how a variable encodes into its graphical attribute. Common guides are x- and y-axis tick marks and color keys.

## 6.1 The Frame

The frame of a graphic provides the space for drawing glyphs. But there is more to a frame than a blank canvas or piece of paper. The frame defines what position means. Most often, the frame is a rectangular region and position is described in terms of the familiar $$(x, y)$$ Cartesian coordinate system. In creating a frame, you must decide which variable in your data will correspond to the $$x$$ coordinate, and which to the $$y$$ coordinate.

For instance, consider a dataset relevant to economic productivity. Table 6.1 gives per capita GDP for each country as well as some of the explanatory candidates: average educational level in the population, length of roadways per unit area, Internet use as a fraction of the population.

Table 6.1: Data relevant to economic performance. The complete table is available at http://tiny.cc/dcf/table-6-2.csv.

Ethiopia 1223.18 4.7 0.04 >0%
Finland 37105.23 6.8 0.23 >60%
Gambia, The 1910.13 4.1 0.33 >5%
India 4036.09 3.2 1.43 >0%
Macau 87904.01 2.7 14.75 >35%
Yemen 2365.57 5.2 0.14 >5%

You define a frame by selecting two variables from the glyph-ready data frame. For instance, Figure 6.1 shows a frame based on GDP and length of roadways. The frame provides the meaning to location in space.

## 6.2 Glyphs

The frame itself doesn’t display any of the cases. Instead, the glyphs positioned in the frame represent the cases. There will be one glyph for each case in the data frame.

The basic shape used in scatter plots is a simple glyph: a dot, a square, a triangle, an x, and so on. Figure 6.2 uses small dots. Since each case is a country, each dot represents one country.

In Figure 6.2, the glyphs are simple. Only position in the frame distinguishes one glyph from another. The shape, size, etc. of all of the glyphs are identical. There’s nothing about the glyph itself which identifies the country. It’s possible to use a glyph with several attributes. Figure 6.3 location and label, mapping country name to the label.

But glyphs can have several properties. The aspects of each glyph that we can perceive are called aesthetics, or equivalently graphical attributes. The word aesthetics applied in the context of glyphs is not used in the modern sense. Nowadays, most people associate aesthetics with notions of beauty and artistic taste. The earlier meaning of the word, properties relating to perception by the senses, is the one intended when it comes to glyphs.

Location in the frame are the $$(x, y)$$ aesthetics for a glyph, but other aesthetics can display variables in the data frame. For instance, color could be used to show Internet use (as a fraction of the population), as in Figure 6.4. Another aesthetic is size. The size is fixed in 6.4; the same for every country. Figure 6.5 maps the average years of eduction onto the size aesthetic.

## 6.3 Scales and Guides

There are four aesthetics in Figure 6.5. Each of the four aethetics is set in correspondence with a variable; we say the variable is mapped to the aesthetic. Length of roadways is being mapped to horizontal position, GDP to vertical position, Internet connectivity to color, and educational attainment to size.

A scale is the relationship between a variable and the aesthetic to which it is mapped. For roadways, the scale says what value of the variable will correspond to position at the bottom of the frame, what value will correspond to the top of the frame, and where things fall inbetween.

Not all scales are about position. For instance, in Figure 6.5, net_users is translated to color. Similarly, average educational attainment (in years) is translated to size: the middle-sized dot corresponds 7½ years of education.

Scales translate values into aesthetic properties. Guides help the human reader to do the back translation. For position aesthetics, the most common sort of guide is the familiar axis with its tick marks and labels. But notice also the guide that tells how dot color corresponds to Internet connectivity. There’s still another guide telling how dot size corresponds to education.

## 6.4 Facets

Using multiple aesthetics such as shape, color, and size to display multiple variables can produce a confusing, hard-to-read graph. Facets provide a simple and effective alternative. Figure 6.6 uses facets to show different levels of Internet connectivity, providing a better view than Figure 6.5.

## 6.5 Layers

On occasion, data from more than one data frame are graphed together. For instance, suppose you want a display of one state’s hospital providers’ charges for different medical procedures. The glyph-ready data frame for New Jersey looks like Table 6.2. The glyph-ready table can be translated to a chart (Figure 6.7 (top)) using bars to give a fair impression of the range in charges for different medical procedures in New Jersey.

Table 6.2: Glyph-ready data for the barplot layer in Figure 6.7

drg stateProvider mean_charge
536 NJ 31390.41
303 NJ 32371.78
310 NJ 33041.72
313 NJ 33183.08
305 NJ 33277.72
203 NJ 33886.60
… and so on for 100 rows altogether.

How do the New Jersey charges compare to those in other states? Tables 6.2 and 6.3 provide relevant data. The two data frames, one for New Jersey and one for the whole country, can be plotted with different types of glyph: bars for New Jersey and dots for the whole country as in Figure 6.8.

Table 6.3: Glyph-ready data frame for the scatter-plot layer in Figure 6.8

drg stateProvider mean_charge
039 AK 34805.13
039 AL 32044.44
039 AR 27463.27
039 AZ 33443.36
039 CA 56094.93
039 CO 35252.21
… and so on for 5,025 rows altogether.

With the context provided by the individual states, it’s easy to see the charges in New Jersey are among the highest in the country for each medical procedure. (A description of each medical procedure number is given in the data frame DirectRecoveryGroups in the DataComputing package.)