Networks

Consider the Minneapolis2013 election. Presumably, there is some relationship between a voter’s first and second choices. This could be visualized as a network. Each candidate is a node, edges exist between candidate pairs who were commonly selected.

Here’s an analysis that looks at the fraction of the ballots that contained a pair among the ballots for each first-choice candidate:

PairFrac <- Minneapolis2013 %>%
  group_by( First, Second ) %>%
  summarise( ballots=n() ) %>% # remains grouped by First
  mutate( frac=ballots/sum(ballots))

Since there are 38 candidates, there are \(38 \times 38 = 1444\) possible pairs.1 Such a large number of edges would be hard to see. So let’s look at the top 3 second-place choices for each candidate and winnow further by requiring that there be more than 20 ballots for the pair:

Edges <-
  PairFrac %>%
  group_by( Second ) %>%
  filter( rank(desc(frac)) <= 3, ballots > 20 )
Edges
Source: local data frame [6 x 4]
Groups: Second

                       First                     Second
1 ABDUL M RAHAMAN "THE ROCK" ABDUL M RAHAMAN "THE ROCK"
2 ABDUL M RAHAMAN "THE ROCK"                  undervote
3          ALICIA K. BENNETT                 MIKE GOULD
4          ALICIA K. BENNETT         STEPHANIE WOODRUFF
5               BETSY HODGES                DON SAMUELS
6               BETSY HODGES                MARK ANDREW
Variables not shown: ballots (int), frac (dbl)

Each candidate will be a node in the network. But there is no natural order to the candidates that would dictate where they should be positioned in a diagram.

The edgesToVertices() function calculates vertex locations in a way that brings connected vertices nearby and separates unconnected vertices.

Vertices <-
  edgesToVertices( Edges, from=First, to=Second )
head(Vertices)
                          ID       x       y
1 ABDUL M RAHAMAN "THE ROCK" -14.293 18.6610
2          ALICIA K. BENNETT  13.930 -3.1135
3               BETSY HODGES  -8.226 -0.6626
4                   BOB FINE  15.972  9.3680
5                 CAM WINTON  15.173 13.4785
6          CHRISTOPHER CLARK  10.682 17.7229

This is easy to plot out, for instance:

Vertices %>% 
  ggplot(  ) + 
  geom_text( aes(label=ID, x=x, y=y))

plot of chunk unnamed-chunk-7

Now the x, y positions of the nodes can be used to set the start and ending positions of each of the edges. To combine the information in Vertices with that in Edges, use edgesForPlotting():

PositionedEdges <-
  edgesForPlotting( Vertices, ID=ID, x, y, from=First, to=Second, Edges=Edges )
head( PositionedEdges)
                      Second                      First
1 ABDUL M RAHAMAN "THE ROCK" ABDUL M RAHAMAN "THE ROCK"
2               BETSY HODGES                DON SAMUELS
3               BETSY HODGES                MARK ANDREW
4               BETSY HODGES                  DOUG MANN
5                   BOB FINE                 CAM WINTON
6                   BOB FINE                  DAN COHEN
  ballots    frac       x      y    xend    yend
1      69 0.20414 -14.293 18.661 -14.293 18.6610
2    3346 0.40144  -3.762 -1.629  -8.226 -0.6626
3    6970 0.35590  -5.902 -5.138  -8.226 -0.6626
4     245 0.31451 -13.141  4.810  -8.226 -0.6626
5     597 0.07948  15.173 13.478  15.972  9.3680
6     119 0.06618  18.982 12.270  15.972  9.3680

geom_segment() lets you draw in the segments. Since the vertices and edges come from different data tables, geom_segment() needs to be passed the positioned edges as the data= argument.

Vertices %>% 
  ggplot(  ) + 
  geom_text( aes(label=ID, x=x, y=y)) +
  geom_segment( data=PositionedEdges, 
                aes( x=x, y=y, xend=xend, yend=yend))

plot of chunk unnamed-chunk-9

You can see that there are two major groups and a couple of minor groups. It might be informative to use color to indicate how many ballots there are in each connection and size of a dot to show the total number of votes a candidate got.

Votes <-
  Minneapolis2013 %>%
  mutate( ID=First ) %>%
  group_by( ID ) %>%
  summarise( total=n() )
Vertices %>%
  inner_join( Votes ) %>%
  ggplot(  ) + 
  geom_point(alpha=.2, aes( x=x, y=y, size=total ) ) +
  scale_size_area( max_size=50, guide="none" ) +
  geom_text( aes(label=ID, x=x, y=y)) +
  geom_segment( data=PositionedEdges, 
                aes( x=x, y=y, xend=xend, yend=yend, color=ballots )) +
  theme(axis.ticks=element_blank(), axis.text=element_blank(), panel.background=element_blank())  

plot of chunk unnamed-chunk-10

You can see the political affiliations and other information about the candidates here. Betsy Hodges, Don Samuels, Mark Andrew, and Jackie Cherryholmes are all affiliated with the DFL; Cam Winton is an independent endorsed by the Republican party.


Please use the comment system to make suggestions, point out errors, or to discuss the topic.

comments powered by Disqus

Written by Daniel Kaplan for the Data & Computing Fundamentals Course. Development was supported by grants from the National Science Foundation for Project Mosaic (NSF DUE-0920350) and from the Howard Hughes Medical Institute.


  1. Including ballots where the first- and second-place choices are the same.