Regional age-sex composition in Germany by NUTS2 60 to 80+ years old

 

Data source: EUROSTAT, personal elaboration of data. Population by age and sex at 1st of July 2020.

A function to call ggplot population pyramids in one line

What does it do?

If you have a dataset with several (or all!) countries/ regions/ ages/ etc and you want a short call function to plot without having to modify the ggplot2 code every time (e.g. a certain country for a certain year), here’s a solution.

Just save the r script with the function for the plot and call it when you want to use it:

pyramid_plot.R

The next step will be to use facet_wrap to select sub-national administrative units at once to be compared against the national plots.
Data can be found for Iraq at national level on my github page.

Below is the population pyramid for Iraq in year 2000

IRQ_2000

Popupation pyramids updated

Upload the relevant packages and dataset. You can find the data on github here

library(tidyverse)
options(scipen = 9)
setwd("/myworkingdirectory/")
mydt % filter(iso=='UGA')

The dataset includes population estimates at subnational level for Uganda.

# reformat the dataset using tidy

newdf % gather(variable, value,6:761) %>% separate(variable,c('year','sex', 'age'), sep='_') %>% mutate(sex=if_else(sex=='F','female','male')) %>%
spread(year, value) %>%
mutate(age2=recode(age, '1'='0-4', '4'='0-4', '5'='5-9','10'='10-14','15'='15-19', '20'='20-24', '25'= '25-29', '30'='30-34', '35'='35-39', '40'='40-44', '45'='45-49', '50'='50-54', '55'='55-59', '60'='60-64', '65'='65-69', '70'='70-74', '75'='75-79', '80'='80+')) %>%
mutate(age=recode(age, '1'='0', '4'='0'))

newdf$age %
gather(key = year, value = pop, 10:14) %>%
# mutate(pop = pop/1e03) %>%
filter(iso == "UGA"&adm_id==c("UGMIS2014452022"), year %in% c(2000, 2005, 2010, 2015, 2020))

newdf4 %
group_by(iso, adm_id, id, year, sex, age, age2, ageno) %>%
summarise(pop= sum(pop)) %>%
mutate(ageno = ageno + 1)

library(ggthemes)
ggplot(data = newdf4, aes(x = age, y = pop/1000, fill = year)) +
#bars for all but 2100
geom_bar(data = newdf4 %>% filter(sex == "female", year != 2100) %>% arrange(rev(year)),
stat = "identity",
position = "identity", width = 4.5) +
geom_bar(data = newdf4 %>% filter(sex == "male", year != 2100) %>% arrange(rev(year)),
stat = "identity",
position = "identity",
mapping = aes(y = -pop/1000)) +
coord_flip() +
scale_y_continuous(labels = abs, breaks = seq(-600, 600, 250)) +
geom_hline(yintercept = 0) +
theme_economist_white(horizontal = FALSE) +
scale_fill_economist() +
labs(fill = "", x = "", y = "")

Screen Shot 2019-07-14 at 15.46.36

 

Composite plots: grid.arrange

I really like composite plots, where there’s a top part that describes a phenomenon and a bottom part with a synthetic time view of the overall process.
I’ve recently discovered this beautiful representation of educational differentials by gender, by Sara Lopus and Margaret Frye, and the beauty of this dataviz is that it tells a story on its own. (Click on the link for the publication)

I have used a random generated data to reproduce the graph in ggplot and used grid.arrange from gridExtra package to bind grobs, the top and bottom components.

grid.arrange(top, bottom, heights=c(10,5), widths=c(20), padding=0)

I have saved the map as a .png file png package and used rasterGrob from package grid to create a raster image graphical object.

Screen Shot 2018-08-30 at 11.26.50