An exercise on data operations

Broad Description

Your task is to find the fraction of the population in each country that lives in cities larger than 100,000.

The Data

For the purposes of this exercise, you will use three data frames:

require(DCFdevel)
require(mosaic)
data(WorldCities) # Creates WorldCities
TotalPop <- CIAdata(2119) # population data
CountryCodes <- fetchData("countrycodes.csv")

## Retrieving from http://www.mosaic-web.org/go/datasets/countrycodes.csv

Each of these frames has information about the identity of a country, but perhaps in different forms. By grouping, summarizing, joining, and arithmetic, you should be able to construct the answer to the problem.

ISO2 and ISO3 are different, but related, official standards for identifying countries.

Tools

The dplyr() functions such as group_by(), mutate(), summarize(), etc. The DCFdevel::toISO3(). See the documentation for that function. Here’s an example:

toISO3(c('Canada','England','Great Britain', 'United Kingdom'))

## [1] CAN  <NA> GBR  GBR 
## 248 Levels:  ABW AFG AGO AIA ALA ALB AND ARE ARG ARM ASM ATA ATF ... ZWE

Process

Before writing the computer commands to perform the operation:

Look at the names and contents of the variables in the different data frames.
Draw a flow diagram of what processes will be required.
Only then, translate your process into computer instructions.

A Solution

Here are the input files (with just the relevant variables):

TotalPop

Each case is a country

##     country      pop
##      Rwanda 12337138
##  Kazakhstan 17948816
##     Moldova  3583288

WorldCities

Each case is a city:

##  Country Population
##       BR      38569
##       IN      18739
##       IQ     434450

The country is identified by a 2-letter ISO code, such as #### CountryCodes

Each case is a country:

##  iso3 iso2
##   SVK   SK
##   GUY   GY
##   MTQ   MQ

General plan:

Create a data table, UrbanPopByCountry which has the urban population for each country
Combine UrbanPopByCountry with the TotalPopByCountry country-wide population for each county. This will look like

Country UrbanPop Total Pop

BEL 7221658 10449361

… … ….