A data table is comprises cases and variables.
Each variable comprises values (or levels).
There is no hard distinction between a variable and a value. What’s a variable in one situation may be a value in another, and vice versa.
A data table
Students
who | x | y | dorm |
---|---|---|---|
Alice | 7 | English | Doty |
Lesley | 19 | Mandarin | Doty |
Yu | 23 | French | Kirk |
who | x | y | dorm |
---|---|---|---|
Alice | 7 | English | Doty |
Lesley | 19 | Mandarin | Doty |
Yu | 23 | French | Kirk |
who | dorm | key | value |
---|---|---|---|
Alice | Doty | x | 7 |
Lesley | Doty | x | 19 |
Yu | Kirk | x | 23 |
Alice | Doty | y | English |
Lesley | Doty | y | Mandarin |
Yu | Kirk | y | French |
who | x | y | dorm |
---|---|---|---|
Alice | 7 | English | Doty |
Lesley | 19 | Mandarin | Doty |
Yu | 23 | French | Kirk |
Data in Key/Value format are narrow
The corresponding wide format has
Narrow:
who | x | y | dorm |
---|---|---|---|
Alice | 7 | English | Doty |
Lesley | 19 | Mandarin | Doty |
Wide:
who | dorm | key | value |
---|---|---|---|
Alice | Doty | x | 7 |
Lesley | Doty | x | 19 |
Alice | Doty | y | English |
Lesley | Doty | y | Mandarin |
who | key | value |
---|---|---|
Alice | x | 7 |
Lesley | x | 19 |
Yu | x | 23 |
Alice | y | English |
Lesley | y | Mandarin |
Yu | y | French |
Alice | dorm | Doty |
Lesley | dorm | Doty |
Yu | dorm | Kirk |
key | value |
---|---|
x | 7 |
x | 19 |
x | 23 |
y | English |
y | Mandarin |
y | French |
dorm | Doty |
dorm | Doty |
dorm | Kirk |
who | Alice |
who | Lesley |
who | Yu |
There’s nothing to identify a case!
Syntax:
WideInput %>%
gather(key_name, value_name, ...)
The ...
are the variables to be gathered together, e.g.
StudentsNarrow <- Students %>% gather(key, value, x, y)
who | dorm | key | value |
---|---|---|---|
Alice | Doty | x | 7 |
Lesley | Doty | x | 19 |
Yu | Kirk | x | 23 |
Alice | Doty | y | English |
Lesley | Doty | y | Mandarin |
Yu | Kirk | y | French |
Aside from Key and Value, all the other variables identify the case.
The gathering makes multiple rows for each row in the wide form. The variables not used for narrowing are copied into the new multiple cases.
Syntax:
NarrowInput %>% spread(key, value)
Process:
StudentsNarrow %>% spread(key, value)
who | dorm | x | y |
---|---|---|---|
Alice | Doty | 7 | English |
Lesley | Doty | 19 | Mandarin |
Yu | Kirk | 23 | French |