Consider the Minneapolis2013
election. Presumably, there is some relationship between a voter’s first and second choices. This could be visualized as a network. Each candidate is a node, edges exist between candidate pairs who were commonly selected.
Here’s an analysis that looks at the fraction of the ballots that contained a pair among the ballots for each first-choice candidate:
PairFrac <- Minneapolis2013 %>%
group_by( First, Second ) %>%
summarise( ballots=n() ) %>% # remains grouped by First
mutate( frac=ballots/sum(ballots))
Since there are 38 candidates, there are \(38 \times 38 = 1444\) possible pairs.1 Such a large number of edges would be hard to see. So let’s look at the top 3 second-place choices for each candidate and winnow further by requiring that there be more than 20 ballots for the pair:
Edges <-
PairFrac %>%
group_by( Second ) %>%
filter( rank(desc(frac)) <= 3, ballots > 20 )
Edges
Source: local data frame [6 x 4]
Groups: Second
First Second
1 ABDUL M RAHAMAN "THE ROCK" ABDUL M RAHAMAN "THE ROCK"
2 ABDUL M RAHAMAN "THE ROCK" undervote
3 ALICIA K. BENNETT MIKE GOULD
4 ALICIA K. BENNETT STEPHANIE WOODRUFF
5 BETSY HODGES DON SAMUELS
6 BETSY HODGES MARK ANDREW
Variables not shown: ballots (int), frac (dbl)
Each candidate will be a node in the network. But there is no natural order to the candidates that would dictate where they should be positioned in a diagram.
The edgesToVertices()
function calculates vertex locations in a way that brings connected vertices nearby and separates unconnected vertices.
Vertices <-
edgesToVertices( Edges, from=First, to=Second )
head(Vertices)
ID x y
1 ABDUL M RAHAMAN "THE ROCK" -14.293 18.6610
2 ALICIA K. BENNETT 13.930 -3.1135
3 BETSY HODGES -8.226 -0.6626
4 BOB FINE 15.972 9.3680
5 CAM WINTON 15.173 13.4785
6 CHRISTOPHER CLARK 10.682 17.7229
This is easy to plot out, for instance:
Vertices %>%
ggplot( ) +
geom_text( aes(label=ID, x=x, y=y))
Now the x, y positions of the nodes can be used to set the start and ending positions of each of the edges. To combine the information in Vertices
with that in Edges
, use edgesForPlotting()
:
PositionedEdges <-
edgesForPlotting( Vertices, ID=ID, x, y, from=First, to=Second, Edges=Edges )
head( PositionedEdges)
Second First
1 ABDUL M RAHAMAN "THE ROCK" ABDUL M RAHAMAN "THE ROCK"
2 BETSY HODGES DON SAMUELS
3 BETSY HODGES MARK ANDREW
4 BETSY HODGES DOUG MANN
5 BOB FINE CAM WINTON
6 BOB FINE DAN COHEN
ballots frac x y xend yend
1 69 0.20414 -14.293 18.661 -14.293 18.6610
2 3346 0.40144 -3.762 -1.629 -8.226 -0.6626
3 6970 0.35590 -5.902 -5.138 -8.226 -0.6626
4 245 0.31451 -13.141 4.810 -8.226 -0.6626
5 597 0.07948 15.173 13.478 15.972 9.3680
6 119 0.06618 18.982 12.270 15.972 9.3680
geom_segment()
lets you draw in the segments. Since the vertices and edges come from different data tables, geom_segment()
needs to be passed the positioned edges as the data=
argument.
Vertices %>%
ggplot( ) +
geom_text( aes(label=ID, x=x, y=y)) +
geom_segment( data=PositionedEdges,
aes( x=x, y=y, xend=xend, yend=yend))
You can see that there are two major groups and a couple of minor groups. It might be informative to use color to indicate how many ballots there are in each connection and size of a dot to show the total number of votes a candidate got.
Votes <-
Minneapolis2013 %>%
mutate( ID=First ) %>%
group_by( ID ) %>%
summarise( total=n() )
Vertices %>%
inner_join( Votes ) %>%
ggplot( ) +
geom_point(alpha=.2, aes( x=x, y=y, size=total ) ) +
scale_size_area( max_size=50, guide="none" ) +
geom_text( aes(label=ID, x=x, y=y)) +
geom_segment( data=PositionedEdges,
aes( x=x, y=y, xend=xend, yend=yend, color=ballots )) +
theme(axis.ticks=element_blank(), axis.text=element_blank(), panel.background=element_blank())
You can see the political affiliations and other information about the candidates here. Betsy Hodges, Don Samuels, Mark Andrew, and Jackie Cherryholmes are all affiliated with the DFL; Cam Winton is an independent endorsed by the Republican party.
Please use the comment system to make suggestions, point out errors, or to discuss the topic.
Written by Daniel Kaplan for the Data & Computing Fundamentals Course. Development was supported by grants from the National Science Foundation for Project Mosaic (NSF DUE-0920350) and from the Howard Hughes Medical Institute.
Including ballots where the first- and second-place choices are the same.↩