Broad Description

Your task is to find the fraction of the population in each country that lives in cities larger than 100,000.

The Data

For the purposes of this exercise, you will use three data frames:

require(DCFdevel)
require(mosaic)
data(WorldCities) # Creates WorldCities
TotalPop <- CIAdata(2119) # population data
CountryCodes <- fetchData("countrycodes.csv")
## Retrieving from http://www.mosaic-web.org/go/datasets/countrycodes.csv

Each of these frames has information about the identity of a country, but perhaps in different forms. By grouping, summarizing, joining, and arithmetic, you should be able to construct the answer to the problem.

ISO2 and ISO3 are different, but related, official standards for identifying countries.

Tools

The dplyr() functions such as group_by(), mutate(), summarize(), etc. The DCFdevel::toISO3(). See the documentation for that function. Here’s an example:

toISO3(c('Canada','England','Great Britain', 'United Kingdom'))
## [1] CAN  <NA> GBR  GBR 
## 248 Levels:  ABW AFG AGO AIA ALA ALB AND ARE ARG ARM ASM ATA ATF ... ZWE

Process

Before writing the computer commands to perform the operation:

A Solution

Here are the input files (with just the relevant variables):

TotalPop

Each case is a country

##     country      pop
##      Rwanda 12337138
##  Kazakhstan 17948816
##     Moldova  3583288

WorldCities

Each case is a city:

##  Country Population
##       BR      38569
##       IN      18739
##       IQ     434450

The country is identified by a 2-letter ISO code, such as #### CountryCodes

Each case is a country:

##  iso3 iso2
##   SVK   SK
##   GUY   GY
##   MTQ   MQ

General plan:

  • Create a data table, UrbanPopByCountry which has the urban population for each country

  • Combine UrbanPopByCountry with the TotalPopByCountry country-wide population for each county. This will look like

    Country UrbanPop Total Pop
    BEL 7221658 10449361
    ….