The World Bank maintains data on migration between countries, based on censuses and other information. These have been translated to an R format as MigrationFlows in the DCFdevel package.

Here’s a small sample of the data:

Referring to the sample, you can see that in year 2000 there were 24,118 males who moved from Poland to Kazakhstan. In that year, there were 2289 females who moved from Indonesia to Australia.

Basic Questions

Structure of data

  • How many variables are there?

  • Is MigrationFlows in “long” or “narrow” format with respect to the years?
  • How many cases are there?

  • How many countries are there?

  • Construct a table that combines the females and males for each country pair for year 2000?
    • The resulting data table will have about 50,000 rows.
    • Imagine what the table containing the result will look like. How many rows? Is the meaning of a row the same or different in the result than in MigrationFlows. Which variables will you use in from MigrationFlows to construct the table with the results? What will be the variables in the result table?
    • Hint: sum() to add numbers together, group_by(), summarize()
      Use head() or sample_n() to display just a few rows of your result.
  • How many migrants originated (that is, emigrants) in each country in 2000?

  • Which 5 countries had the largest number of emigrants? (Hint: arrange(), desc())

  • Compute the fraction of each origin country’s year-2000 emigration that goes to each destination country.
    • In the result table, what will be the cases? Are they the same as the original table? What will be the variables? Which ones are the same as in the original table.

    • Explain why mutate() and not summarize() was used here.
    • Is the select() necessary to finding the outPercent variable?

  • For each origin country, what is the largest destination?

  • For each destination country, what is the largest origin?

  • For each destination country, what are the two largest origin countries?

Bringing in other data

Maps

Networks

On your own …

Countries where the size of the exchange is balanced to within a factor of 3.3. Red is for females; blue for males.