OrdwayBirds data frame is a historical record of birds captured and released at the Katharine Ordway Natural History Study Area, a 278-acre preserve in Inver Grove Heights, Minnesota, owned and managed by Macalester College. Originally written by hand in a field notebook, the entries have been transcribed into electronic format under the supervision of Jerald Dosch, Dept. of Biology, Macalester College.
Due to mistakes in data entry, the
SpeciesName variable needs some fixing.
SpeciesName is intended to identify the species of each of the birds, but the spelling often varies among birds of the same biological species. This leads to mis-classification of birds. There are also problems with the
Day variables; they are supposed to be numerical, but mistakes prevent them from being correctly identified as such.
Fortunately, all these errors are easy to correct. The data frame
OrdwaySpeciesNames collects together all the variant spellings. Entry by entry, each mis-spelling was translated (by a human) into a standardized spelling. Thus,
join() can be used to correct the mis-spellings in the
You are going to look at the month-to-month presence of different species. Think of your assignment as creating a manual for birders to guide them to the correct time of year to visit Ordway to see a particular species.
There are many variables that you won’t need for this activity, and you still have to fix the
Day variables. To keep things simple, cut and paste this command into a chunk at the start of the document.
OrdwayBirds data are available in the
<- OrdwayBirds %>% OrdwayBirds select( SpeciesName, Month, Day ) %>% mutate( Month = as.numeric(as.character(Month)), Day = as.numeric(as.character(Day)))
mutate() step is part of the data cleaning process, converting
Day as numerical variables as originally intended by the folks entering the data.
Including mis-spellings, how many different species are there in the
OrdwaySpeciesNames data frame also found in the
dcData package as well.
How many distinct species are there in the
SpeciesNameCleaned variable in
You will find it helpful to use
n_distinct() a reduction function, which counts the number of unique values in a variable.
OrdwaySpeciesNames table to create a new data frame that corrects the mis-spellings in
SpeciesNames. This can be done easily using the
inner_join() data verb.
<- Corrected %>% OrdwayBirds inner_join( OrdwaySpeciesNames ) %>% select( Species = SpeciesNameCleaned, Month, Day ) %>% na.omit() # cleaned up the missing ones
Look at the names of the variables in
How many bird captures are reported for each of the (corrected) species?
Call the variable that contains the total
count. Arrange this into descending order from the species with the most birds, and look through the list. Hint: Remember n(). Also, one of the arguments to one of the data verbs will be desc(count) to arrange the cases into descending order. Display the top 10 species in terms of the number of bird captures.
Define for yourself a “major species” as a species with more than a particular threshold count. Set your threshold so that there are 5 or 6 species designated as major.
Filter to produce a data frame with only the birds that belong to a major species.Hint: Remember that summary functions can be used case-by-case when filtering or mutating a data frame that has been grouped.
Save the output in a table called
When you have correctly produced
Majors, write a command that produces the month-by-month count of each of the major species. Call this table
Display this month-by-month count with a bar chart arranged in a way that you think tells the story of what time of year the various species appear. You can use
mplot() to explore different possibilities. Warning: mplot() and similar interactive functions should not appear in your Rmd file, it needs to be used interactively from the console. Use the “Show Expression” button in mplot() to create an expression that you can cut and paste into a chunk in your Rmd document, so that the graph gets created when you compile it.
Once you have the graph, use it to answer these questions: