Daniel Kaplan
June 13, 2013
The tabular format for data
A single value is stored in each cell. For us …
Auxiliary information stored in a Codebook
What constitutes a Case?
OrdwayBirdsOrig
nhanes
WakeVotersSmall
What is each Variable?
data(OrdwayBirdsOrig)
names(OrdwayBirdsOrig)
[1] "bogus" "Timestamp" "Year"
[4] "Day" "Month" "CaptureTime"
[7] "SpeciesName" "Sex" "Age"
[10] "BandNumber" "TrapID" "Weather"
[13] "BandingReport" "RecaptureYN" "RecaptureMonth"
[16] "RecaptureDay" "Condition" "Release"
[19] "Comments" "DataEntryPerson" "Weight"
[22] "WingChord" "Temperature" "RecaptureOriginal"
[25] "RecapturePrevious" "TailLength"
nrow(OrdwayBirdsOrig)
[1] 15829
summary(OrdwayBirdsOrig$Month)
1 10 11 12 2 25 3 4 5 6 7
4 622 3315 1164 552 601 1 905 1584 2433 1097 1037
8 9 Month
749 1765 0
class(OrdwayBirdsOrig$Month)
[1] "factor"
Important classes:
data.frame
: a collection of casesnumeric
, integer
: quantitativefactor
, character
: categoricalWe're expecting Month
to be quantitative, but it isn't.
as.quantitative()
might do sensible things to convert a factor to a quantitative variable.allowed
method for quantitative and categoricalsummary(OrdwayBirdsOrig$SpeciesName)
Slate-colored Junco Tree Sparrow American Goldfinch
2717 1331 1153
Black-capped Chickadee Robin Field Sparrow
1110 606 582
Catbird Song Sparrow Blue Jay
551 509 476
House Wren Myrtle Warbler Lincoln's Sparrow
457 446 395
Chipping Sparrow White-breasted Nuthatch Fox Sparrow
315 236 234
White-throated Sparrow House Sparrow Baltimore Oriole
229 207 206
Tree Swallow Black-capped chickadee Bluebird
201 187 170
Nashville Warbler Downy Woodpecker Rose-breasted Grosbeak
160 143 137
Least Flycatcher Purple Finch Brown Thrasher
124 121 109
Yellowthroat Swainson's Thrush Indigo Bunting
107 103 101
Palm Warbler Cowbird White-throat Sparrow
97 90 86
Tennessee Warbler Swamp Sparrow White-crowned Sparrow
85 83 78
Cardinal Ruby-crowned Kinglet Common Redpoll
76 75 72
Grackle Cedar Waxwing American Gold Finch
69 57 50
Eastern Bluebird Traill's Flycatcher Red-winged Blackbird
49 47 39
Red-eyed Vireo Starling Wood Pewee
37 37 37
Yellow-shafted Flicker Rose-breast Grosbeak Black-throat Sparrow
34 33 31
Orange-crown Warbler Pine Siskin Mourning Dove
31 31 29
Hairy Woodpecker Barn Swallow Ovenbird
25 23 23
White-breast Nuthatch Wilson's Warbler Bank Swallow
23 22 21
Hermit Thrush Ruby-Crowned Kinglet Philadelphia Vireo
21 21 20
Phoebe Yellow Warbler Common Grackle
19 19 18
Ruby-crested Kinglet White-Breasted Nuthatch Orange-crowned Warbler
18 18 17
White-crown Sparrow Ruby-crown Kinglet Black-billed Cuckoo
17 16 15
Red-breast Grosbeak Solitary Vireo Black-Capped Chickadee
15 14 13
Blackcapped Chickadee Northern Waterthrush Palm Warbler (W)
13 13 13
Rose-Breasted Grosbeak Clay-colored Sparrow Golden-crowned Kinglet
13 12 12
Brown Towhee Eastern Phoebe Flicker
11 11 11
Gray-cheeked Thrush Northern Shrike Olive-sided Flycatcher
11 11 11
Red-wing Blackbird Steller's Jay White-Throated Sparrow
11 11 11
Palm Warbler (Y) Rough-winged Swallow Slate-Colored Junco
10 10 10
Black and White Warbler N/A Orange-Crowned Warbler
9 9 9
Pewee Cactus Wren Harris' Sparrow
9 8 8
(Other)
398
levels(OrdwayBirdsOrig$SpeciesName)
[1] ""
[2] "-lost-"
[3] "-missing-"
[4] "[Nothing, just dashes]"
[5] "13:00:00"
[6] "Acadian Flycatcher"
[7] "American Gold Finch"
[8] "American Goldfinch"
[9] "American Golf Finch"
[10] "American Redfinch"
[11] "American Redstart"
[12] "American Robin"
[13] "Arkansas Kingbird"
[14] "Baltimore Oriole"
[15] "Bank Swallow"
[16] "Barn Swallow"
[17] "Batimore Oriole"
[18] "Bay-breasted Warbler"
[19] "Blac-capped Chickadee"
[20] "Black and White Warbler"
[21] "Black-and-white Warbler"
[22] "Black-billed Cookoo"
[23] "Black-billed Cuckoo"
[24] "Black-capeed Chickadee"
[25] "Black-capped Chicakdee"
[26] "Black-capped chickadee"
[27] "Black-capped Chickadee"
[28] "Black-Capped Chickadee"
[29] "Black-capped Chikadee"
[30] "Black-throat Sparrow"
[31] "Black-throat-Sparrow"
[32] "Blackcapped Chickadee"
[33] "Blackpoll Warbler"
[34] "Blatimore Oriole"
[35] "Blue Jay"
[36] "Blue-headed Vireo"
[37] "Blue-winged Warbler"
[38] "Bluebird"
[39] "Boreal Chickadee"
[40] "Brewer's Sparrow"
[41] "Brown Creeper"
[42] "Brown Thrasher"
[43] "Brown Towhee"
[44] "Brown-head Cowbird"
[45] "Brown-headed Cowbird"
[46] "Cactus Wren"
[47] "Car"
[48] "Cardinal"
[49] "Carolina Chickadee"
[50] "Cartbird"
[51] "Catbird"
[52] "Catbird I"
[53] "Catibird"
[54] "Cedar waxwing"
[55] "Cedar Waxwing"
[56] "Chestnut-backed Chickadee"
[57] "Chestnut-sided Warbler"
[58] "Chic E"
[59] "Chickadee"
[60] "Chip Sparrow"
[61] "Chipping Sparrow"
[62] "Chirpping Sparrow"
[63] "Chripping Sparrow"
[64] "Clay-col Sparrow"
[65] "Clay-colored Sparrow"
[66] "Clay-Colored Sparrow"
[67] "Clear; moderate winds; traps closed about 2 hours"
[68] "Cloudy; calm"
[69] "Common Crow"
[70] "Common Grackle"
[71] "Common Nighthawk"
[72] "Common Redpoll"
[73] "Common Yellowthroat"
[74] "Connecticut Warbler"
[75] "Corve-billed Thrasher"
[76] "Cowbird"
[77] "Curve-billed Thrasher"
[78] "Downy Woodpecker"
[79] "E Bluebird"
[80] "E. Wood Pewee"
[81] "E/Net"
[82] "Easter Phoebe"
[83] "Eastern Bluebird"
[84] "Eastern Kingbird"
[85] "Eastern Meadowlark"
[86] "Eastern Phoebe"
[87] "Eastern Robin"
[88] "Eastern Wood Pewee"
[89] "Field Sparrow"
[90] "Filed Sparrow"
[91] "Flicker"
[92] "Fox Sparrow"
[93] "Golden-Crested Kinglet"
[94] "Golden-crowned Kinglet"
[95] "Golden-Crowned Kinglet"
[96] "Goldfinch"
[97] "Grackle"
[98] "Gray-checked Thrush"
[99] "Gray-cheek Thrush"
[100] "Gray-cheeked Thrush"
[101] "Great Crested Flycatcher"
[102] "Great-crested Flycatcher"
[103] "Green Heron"
[104] "Grey-cheeked Thrush"
[105] "Ground Dove"
[106] "Grt-crested Flycatcher"
[107] "Hairy Woodpecker"
[108] "Harris Sparrow"
[109] "Harris' Sparrow"
[110] "Harris's Sparrow"
[111] "Hermit Thrush"
[112] "Horned Lark"
[113] "House Finch"
[114] "House Sparrow"
[115] "House wren"
[116] "House Wren"
[117] "House Wrren"
[118] "Inca Dove"
[119] "Indigo Bunting"
[120] "Kestral"
[121] "Kestrel"
[122] "Killdeer"
[123] "Kingbird"
[124] "Kiskadee F.C."
[125] "Least Fly Catcher"
[126] "Least flycatcher"
[127] "Least Flycatcher"
[128] "Lincoln Sparrow"
[129] "Lincoln's Sparrow"
[130] "lost"
[131] "Lost"
[132] "LOST"
[133] "M/1"
[134] "Magnolia Warbler"
[135] "Mockingbird"
[136] "Mourning Dove"
[137] "Mourning Warbler"
[138] "Mrytle Warbler"
[139] "Myrtl Warbler"
[140] "Myrtle Warbler"
[141] "N/A"
[142] "Nashville Warbler"
[143] "none"
[144] "Northern Shrike"
[145] "Northern Waterthrush"
[146] "Northern Yellowthroat"
[147] "Olive-sided"
[148] "Olive-sided Flycatcher"
[149] "Orange-crown Warbler"
[150] "Orange-crowned Warbler"
[151] "Orange-Crowned Warbler"
[152] "Orchard Oriole"
[153] "Oregon Junco"
[154] "Ovenbird"
[155] "Overbird"
[156] "Palm Warbler"
[157] "Palm Warbler (w)"
[158] "Palm Warbler (W)"
[159] "Palm Warbler (Y)"
[160] "Partly-cloudy; light winds"
[161] "Pectoral Sandpiper"
[162] "Pewee"
[163] "Phainopepla"
[164] "Philadelphia Vireo"
[165] "Philadeplhia Vireo"
[166] "Phoebe"
[167] "Pine Siskin"
[168] "Pine SIskin"
[169] "Purple Finch"
[170] "Purple FInch"
[171] "Pyrrhuloxia"
[172] "Red-bellied Sapsucker"
[173] "Red-bellied Woodpecker"
[174] "Red-Bellied Woodpecker"
[175] "Red-breast Grosbeak"
[176] "Red-Breast Grosbeak"
[177] "Red-breasted Grosbeak"
[178] "Red-eye Vireo"
[179] "Red-eyed Cowbird"
[180] "Red-eyed Viero"
[181] "Red-Eyed Viero"
[182] "Red-eyed Vireo"
[183] "Red-headed Woodpecker"
[184] "Red-tailed Hawk"
[185] "Red-wing Blackbird"
[186] "Red-winged Blackbird"
[187] "Red-Winged Blackbird"
[188] "Red-winged Blackhead"
[189] "Redstart"
[190] "Robin"
[191] "Robn"
[192] "Rose Breasted Grosbeak"
[193] "Rose Brested Grosbeak"
[194] "Rose-beak Grosbeak"
[195] "Rose-breast Grosbeak"
[196] "Rose-breasted Grosbeak"
[197] "Rose-Breasted Grosbeak"
[198] "Rose-Breasted Grosbeck"
[199] "Rose-breasted Groshk"
[200] "Rough-winged Swallow"
[201] "Ruby-breasted Grosbeak"
[202] "Ruby-crested Kinglet"
[203] "Ruby-Crested Kinglet"
[204] "Ruby-crown Kinglet"
[205] "Ruby-crowned Kinglet"
[206] "Ruby-Crowned Kinglet"
[207] "Ruby-throated Hummingbird"
[208] "Rudycrested Kinglet"
[209] "Ruf-sided Towhee"
[210] "Rufoos-sided Towhee"
[211] "Rufous-sided Towhee"
[212] "Rufous-sTowhee"
[213] "Savannah Sparrow"
[214] "Slate-colored junco"
[215] "Slate-colored Junco"
[216] "Slate-Colored Junco"
[217] "Slate-colorerd Junco"
[218] "Slate-colorred Junco"
[219] "Solitary Vireo"
[220] "Song (Field?) Sparrow"
[221] "Song Sparrow"
[222] "Song Sparrow Lincoln's??"
[223] "Song Sparrow(?)"
[224] "Sparrow Hawk"
[225] "species name"
[226] "Starling"
[227] "Steller's Jay"
[228] "Swainson's Thrush"
[229] "Swamp Sparrow"
[230] "Tennesse Warbler"
[231] "Tennessee Warbler"
[232] "Traill's Flycatcher"
[233] "Tree L"
[234] "Tree Sparow"
[235] "Tree Sparrow"
[236] "Tree Swallow"
[237] "TS"
[238] "Tufted Titmouse"
[239] "Unknown"
[240] "Varied Thrush"
[241] "Veery"
[242] "Vesper Sparrow"
[243] "Warbling Vireo"
[244] "White Breasted Nuthatch"
[245] "White-breast Nuthatch"
[246] "White-breasted Nuthatch"
[247] "White-Breasted Nuthatch"
[248] "White-Crested Sparrow"
[249] "White-crown Sparrow"
[250] "White-crowned Sparrow"
[251] "White-eyed Vireo"
[252] "White-Fronted Dove"
[253] "White-thorat Sparrow"
[254] "White-throat Sparrow"
[255] "White-throated Sparrow"
[256] "White-Throated Sparrow"
[257] "White-Throated Sparrows"
[258] "White-winged Junco"
[259] "Wht-brstd Nuthatch"
[260] "Wilson Warbler"
[261] "Wilson's Warbler"
[262] "Winter Wren"
[263] "Wood Pewee"
[264] "Wood Thrush"
[265] "Woodcock"
[266] "Wren"
[267] "Yellow Flicker"
[268] "Yellow Shafted Flicker"
[269] "Yellow Warbler"
[270] "Yellow-bellied Flycatcher"
[271] "Yellow-bellied Sapsucker"
[272] "Yellow-shaft Flicker"
[273] "Yellow-shafted flicker"
[274] "Yellow-shafted Flicker"
[275] "Yellow-tailed Oriole"
[276] "Yellowthroat"
tally(~SpeciesName, data=OrdwayBirds)
Error: object 'OrdwayBirds' not found
data(OrdwayBirdsOrig)
Birds <- subset(OrdwayBirdsOrig,
select=c("Day","Month","SpeciesName"))
names(Birds)
[1] "Day" "Month" "SpeciesName"
Oct <- subset(Birds, Month=="10")
Fall <- subset(Birds, Month %in% c("10","11"))
head(Birds,3)
Day Month SpeciesName
3 16 7 Song Sparrow
4
5 16 7 Song Sparrow
sample(Birds,3)
Day Month SpeciesName orig.ids
12558 10 10 Song Sparrow 12556
10989 26 5 Grt-crested Flycatcher 10987
14488 13 7 Red-wing Blackbird 14486
A relational database is an important form of organization of data.
Relations are more or less the same as tables.
Relational databases are more or less the same as collections of tables.
SQL (Structured Query Language) is a widely used tool for constructing, operating, and interacting with relational databases.
NOT
In SQL terminology:
PROJECT, SELECT, GROUP, JOIN
Spreadsheet software (e.g. Excel, Google Docs/Forms) is sometimes a good choice for entering data.
However, spreadsheets do not enforce the basic constraints of a data table:
Spreadsheets support a casual and undisciplined approach to organization that can make it hard to carry out analyses.
Don't use a spreadsheet like a tablecloth!
The standard R representation of a table is a data frame.
The R functions highlighted in DCF have a typical form:
value <- operation( [what to do], data=[name of data frame], [additional info])
You can refer to variables by their name
Tables are typically stored in files and read in to an R session as a data frame.
Examples in DCF:
data(WakeVotersSmall)
g <- fetchData("grades.csv")
Other possibilities:
Changing the data in the R session does not change the original store.
Example: nhanes
body measurement data. Creating a new “surface area” measure on a cylindrical model, waist circumference \( \times \) height
data(nhanes) # read the data
nhanes <- transform(nhanes, area=wst*hgt)
Required inputs:
Required inputs:
Combine information from two tables with (possibly) different cases.
Ordway Bird data to this graphic:
Month | Species Name |
---|---|
3 | Tree Sparrow |
10 | Fox Sparrow |
10 | American Goldfinch |
3 | Tree Sparrow |
1 | Black-capped Chickadee |
12 | Black-capped Chickadee |
Month | Species Name | Count |
---|---|---|
3 | Tree Sparrow | 7 |
10 | Fox Sparrow | 21 |
10 | American Goldfinch | 9 |
1 | Black-capped Chickadee | 1 |
12 | Black-capped Chickadee | 3 |
Make the bird graphic.
Hints:
speciesCount
table in an appropriate form.mBar(speciesCount)
to generate the graphOther syntax for mavens: