Wide vs Narrow Data Tables

A data table is comprises cases and variables.

Each variable comprises values (or levels).

There is no hard distinction between a variable and a value. What’s a variable in one situation may be a value in another, and vice versa.

A data table

Students
who x y dorm
Alice 7 English Doty
Lesley 19 Mandarin Doty
Yu 23 French Kirk

Cases, Variables, and Values

  • Variables: Who, X, and Y
    • Values:
      • Who is a person’s name
      • X is numeric
      • Y is a language name
      • dorm is a building name
  • Cases: Alice, Lesley, Yu
who x y dorm
Alice 7 English Doty
Lesley 19 Mandarin Doty
Yu 23 French Kirk

Two formats

  • Narrow
who dorm key value
Alice Doty x 7
Lesley Doty x 19
Yu Kirk x 23
Alice Doty y English
Lesley Doty y Mandarin
Yu Kirk y French
  • Wide
who x y dorm
Alice 7 English Doty
Lesley 19 Mandarin Doty
Yu 23 French Kirk

Narrow and Wide

Data in Key/Value format are narrow

The corresponding wide format has

  • separate variables for each level in Key
  • sets the values for those variables from the info in Value

Narrow:

who x y dorm
Alice 7 English Doty
Lesley 19 Mandarin Doty

Wide:

who dorm key value
Alice Doty x 7
Lesley Doty x 19
Alice Doty y English
Lesley Doty y Mandarin

Narrow is relative

who key value
Alice x 7
Lesley x 19
Yu x 23
Alice y English
Lesley y Mandarin
Yu y French
Alice dorm Doty
Lesley dorm Doty
Yu dorm Kirk

Too narrow

key value
x 7
x 19
x 23
y English
y Mandarin
y French
dorm Doty
dorm Doty
dorm Kirk
who Alice
who Lesley
who Yu

There’s nothing to identify a case!

Gather — from Wide to Narrow

Syntax:

WideInput %>% 
  gather(key_name, value_name, ...)

The ... are the variables to be gathered together, e.g.

StudentsNarrow <- Students %>% gather(key, value, x, y)
who dorm key value
Alice Doty x 7
Lesley Doty x 19
Yu Kirk x 23
Alice Doty y English
Lesley Doty y Mandarin
Yu Kirk y French

Cases in Narrow data

Aside from Key and Value, all the other variables identify the case.

The gathering makes multiple rows for each row in the wide form. The variables not used for narrowing are copied into the new multiple cases.

Spread — from Narrow to Wide

Syntax:

NarrowInput %>% spread(key, value)

Process:

  1. Group by all variables other than Key and Value These groups become the cases
  2. Create new variables for each level in Key
  3. Within each group, spread out the Values into the new variables.
StudentsNarrow %>% spread(key, value)
who dorm x y
Alice Doty 7 English
Lesley Doty 19 Mandarin
Yu Kirk 23 French