Data source: EUROSTAT, personal elaboration of data. Population by age and sex at 1st of July 2020.
What does it do?
If you have a dataset with several (or all!) countries/ regions/ ages/ etc and you want a short call function to plot without having to modify the ggplot2 code every time (e.g. a certain country for a certain year), here’s a solution.
Just save the r script with the function for the plot and call it when you want to use it:
The next step will be to use facet_wrap to select sub-national administrative units at once to be compared against the national plots.
Data can be found for Iraq at national level on my github page.
Below is the population pyramid for Iraq in year 2000
Upload the relevant packages and dataset. You can find the data on github here
library(tidyverse) options(scipen = 9) setwd("/myworkingdirectory/") mydt % filter(iso=='UGA')
The dataset includes population estimates at subnational level for Uganda.
# reformat the dataset using tidy newdf % gather(variable, value,6:761) %>% separate(variable,c('year','sex', 'age'), sep='_') %>% mutate(sex=if_else(sex=='F','female','male')) %>% spread(year, value) %>% mutate(age2=recode(age, '1'='0-4', '4'='0-4', '5'='5-9','10'='10-14','15'='15-19', '20'='20-24', '25'= '25-29', '30'='30-34', '35'='35-39', '40'='40-44', '45'='45-49', '50'='50-54', '55'='55-59', '60'='60-64', '65'='65-69', '70'='70-74', '75'='75-79', '80'='80+')) %>% mutate(age=recode(age, '1'='0', '4'='0')) newdf$age % gather(key = year, value = pop, 10:14) %>% # mutate(pop = pop/1e03) %>% filter(iso == "UGA"&adm_id==c("UGMIS2014452022"), year %in% c(2000, 2005, 2010, 2015, 2020)) newdf4 % group_by(iso, adm_id, id, year, sex, age, age2, ageno) %>% summarise(pop= sum(pop)) %>% mutate(ageno = ageno + 1) library(ggthemes) ggplot(data = newdf4, aes(x = age, y = pop/1000, fill = year)) + #bars for all but 2100 geom_bar(data = newdf4 %>% filter(sex == "female", year != 2100) %>% arrange(rev(year)), stat = "identity", position = "identity", width = 4.5) + geom_bar(data = newdf4 %>% filter(sex == "male", year != 2100) %>% arrange(rev(year)), stat = "identity", position = "identity", mapping = aes(y = -pop/1000)) + coord_flip() + scale_y_continuous(labels = abs, breaks = seq(-600, 600, 250)) + geom_hline(yintercept = 0) + theme_economist_white(horizontal = FALSE) + scale_fill_economist() + labs(fill = "", x = "", y = "")
I really like composite plots, where there’s a top part that describes a phenomenon and a bottom part with a synthetic time view of the overall process.
I’ve recently discovered this beautiful representation of educational differentials by gender, by Sara Lopus and Margaret Frye, and the beauty of this dataviz is that it tells a story on its own. (Click on the link for the publication)
I have used a random generated data to reproduce the graph in ggplot and used
gridExtra package to bind grobs, the top and bottom components.
grid.arrange(top, bottom, heights=c(10,5), widths=c(20), padding=0)
I have saved the map as a .png file png package and used
rasterGrob from package
grid to create a raster image graphical object.