Long to wide format with tidyr (and save it in n files)

The data comes from the https://esa.un.org/unpd/wpp/UN population projections

library(tidyr) #load tidyr or <a href="https://www.tidyverse.org/">tidyverse</a>, the latter being a collection of libraries

setwd("/Users/...") #set your working directory

dt <- read.csv("mydataset.csv", header=T) #read data

head(dt) #look at data

##   Index       Country Year Age Male_Pop Female_Pop
## 1     1 AmericanSamoa 2000   0      874        836
## 2     2 AmericanSamoa 2000   1      773        747
## 3     3 AmericanSamoa 2000   2      760        735
## 4     4 AmericanSamoa 2000   3      783        760
## 5     5 AmericanSamoa 2000   4      820        796
## 6     6 AmericanSamoa 2000   5      851        825

The idea would be to have a KEY column with the variables names and a VALUE column with the values. Since we have 2 value columns (male_pop and female_pop) we first need to gather them into 1 value column (Pop_sex) and then paste Pop_sex with Age.

# get it into the right format for "spread"
dt1 <- dt %>%

  gather(Pop_sex, value, 5:6) %>%

  unite(Pop_age, 5, 4, sep="_", remove=T) %>% # paste cols 5 and 4

  spread(Pop_age, value) %>% # spread into wide format

  write.csv(., file = "~/My folder of choice/nameofmyfile.csv") # this is optional

There’s a useful trick I’ve been using to get n csv files out of one long format dataset (eg. 1 file per year), I’ve found this somewhere in stackoverflow:

customFun  = function(mydt) {
  write.csv(mydt,paste0("name_",unique(mydt$year),".csv"))
  return(mydt)
}

mydt %>% 
  unite(newvar, 3:4, sep="_", remove=T) %>%
  spread(newvar, value) %>%
  group_by(year) %>% 
  do(customFun(.))

Note of the author: wide formats are never very useful but in case you really need them (linear regression &co) tidyr is a very compact solution. Be mindful that spreading over >1000 cols takes time. To get back from wide to long format use gather

Author: acarioli

is a researcher at the Geography and Environment department of the University of Southampton, WorldPop project team. She is also affiliated researcher at CED, UAB and Dondena Centre. Her interests include spatial econometrics and modeling, bayesian methods, machine learning processes, forecasting, micro-data simulation, and data visualization. Demo-traveler, Mac enthusiast, R zealot and Rladies member.