State by State

Using the ZipGeography data table, answer the following questions. In addition to the answer itself, show the statement that you used and the data table created by your statement that contains the answer.

Babies and the Bible

You can access a list of bible-related first names with this command:

BibleNames <- fetchData("DCF/BibleNames.csv")

It looks like this:

     name                                 meaning
1   Aaron  a teacher; lofty; mountain of strength
2 Abaddon                           the destroyer
3 Abagtha                father of the wine-press
4   Abana               made of stone; a building
5  Abarim                    passages; passengers
6    Abba                                  father
  1. Using BibleNames and BabyNames
    • Create a data table, BibleCount, that gives, for each sex and each year, the number of babies born with bible-related names.
    • Make a meaningful graphic from BibleCount displaying this.
    • Create a data table, BibleGirls that lists biblical names that are used primarily for girls.

Gender-Neutral Names

The BabyNames data table looks like this:

     name sex count year
1 Silvano   M    20 2000
2   Edgar   M   721 1964
3   Adiel   M     8 1984
4  Matias   M    32 1975
5   Nuria   F     8 2012
6    Holt   M     7 1962

Turn this into a wide-format table that looks like this:

     name year     F    M
1     Ada 1912  1268    6
2    Adam 2000    16 8132
3    Alex 1999   278 6826
4   Alice 1883  1488    6
5   Alice 1923 11330   27
6 Allison 1933    15   40

Your statement will have the following form: you have to fill in the variable name to replace ???

BothSexes <-
  BabyNames %>%
  spread( key=???, value=count ) %>%
  filter( F>1, M>1 )

Now that you have BothSexes

  1. Find the 10 names with the closest balance between females and males. You can define gender-balance quantitatively as abs(log(F/M)). The smaller this number, the more balanced the name count.
  2. Find the 10 names with the closest balance that have more than 100 babies of either sex.
  3. Extra Credit. Find the 10 names with the closest balance that have lasting popularity. Define lasting popularity as having more than 100 babies in at least 20 of the years.
  4. Extra Credit. Are there names that have switched gender over the years. Find, for each year, the gender ratio. Pull out ones where the maximum and minumum over the years differ by a large amount. Then plot out over time.

Please use the comment system to make suggestions, point out errors, or to discuss the topic.

comments powered by Disqus

Written by Daniel Kaplan for the Data & Computing Fundamentals Course. Development was supported by grants from the National Science Foundation for Project Mosaic (NSF DUE-0920350) and from the Howard Hughes Medical Institute.