Your task is to find the fraction of the population in each country that lives in cities larger than 100,000.
For the purposes of this exercise, you will use three data frames:
require(DCFdevel)
require(mosaic)
data(WorldCities) # Creates WorldCities
TotalPop <- CIAdata(2119) # population data
CountryCodes <- fetchData("countrycodes.csv")
## Retrieving from http://www.mosaic-web.org/go/datasets/countrycodes.csv
Each of these frames has information about the identity of a country, but perhaps in different forms. By grouping, summarizing, joining, and arithmetic, you should be able to construct the answer to the problem.
ISO2 and ISO3 are different, but related, official standards for identifying countries.
The dplyr()
functions such as group_by()
, mutate()
, summarize()
, etc. The DCFdevel::toISO3()
. See the documentation for that function. Here’s an example:
toISO3(c('Canada','England','Great Britain', 'United Kingdom'))
## [1] CAN <NA> GBR GBR
## 248 Levels: ABW AFG AGO AIA ALA ALB AND ARE ARG ARM ASM ATA ATF ... ZWE
Before writing the computer commands to perform the operation:
Here are the input files (with just the relevant variables):
Each case is a country
## country pop
## Rwanda 12337138
## Kazakhstan 17948816
## Moldova 3583288
Each case is a city:
## Country Population
## BR 38569
## IN 18739
## IQ 434450
The country is identified by a 2-letter ISO code, such as #### CountryCodes
Each case is a country:
## iso3 iso2
## SVK SK
## GUY GY
## MTQ MQ
General plan:
Create a data table, UrbanPopByCountry
which has the urban population for each country
Combine UrbanPopByCountry
with the TotalPopByCountry
country-wide population for each county. This will look like
Country | UrbanPop | Total Pop |
---|---|---|
BEL | 7221658 | 10449361 |
… | … | …. |