Regional age-sex composition in Germany by NUTS2 60 to 80+ years old

 

Data source: EUROSTAT, personal elaboration of data. Population by age and sex at 1st of July 2020.

A function to call ggplot population pyramids in one line

What does it do?

If you have a dataset with several (or all!) countries/ regions/ ages/ etc and you want a short call function to plot without having to modify the ggplot2 code every time (e.g. a certain country for a certain year), here’s a solution.

Just save the r script with the function for the plot and call it when you want to use it:

pyramid_plot.R

The next step will be to use facet_wrap to select sub-national administrative units at once to be compared against the national plots.
Data can be found for Iraq at national level on my github page.

Below is the population pyramid for Iraq in year 2000

IRQ_2000

tidy datasets: ‘col_types’ to read a variable in the format you want

I have been having issues in my datasets as one variable is both numeric and text. Rather than arguing with excel, tidy offers a quick and painless solution: col_types:

dt <- read_csv("_yourdataset_.csv", col_types = cols(tricky_variable = "c"))[/sourcecode language="r"]

with date, time, numeric, integers, doubles, guess (let tidy guess the type) and skip as some of the available options.

Popupation pyramids updated

Upload the relevant packages and dataset. You can find the data on github here

library(tidyverse)
options(scipen = 9)
setwd("/myworkingdirectory/")
mydt % filter(iso=='UGA')

The dataset includes population estimates at subnational level for Uganda.

# reformat the dataset using tidy

newdf % gather(variable, value,6:761) %>% separate(variable,c('year','sex', 'age'), sep='_') %>% mutate(sex=if_else(sex=='F','female','male')) %>%
spread(year, value) %>%
mutate(age2=recode(age, '1'='0-4', '4'='0-4', '5'='5-9','10'='10-14','15'='15-19', '20'='20-24', '25'= '25-29', '30'='30-34', '35'='35-39', '40'='40-44', '45'='45-49', '50'='50-54', '55'='55-59', '60'='60-64', '65'='65-69', '70'='70-74', '75'='75-79', '80'='80+')) %>%
mutate(age=recode(age, '1'='0', '4'='0'))

newdf$age %
gather(key = year, value = pop, 10:14) %>%
# mutate(pop = pop/1e03) %>%
filter(iso == "UGA"&adm_id==c("UGMIS2014452022"), year %in% c(2000, 2005, 2010, 2015, 2020))

newdf4 %
group_by(iso, adm_id, id, year, sex, age, age2, ageno) %>%
summarise(pop= sum(pop)) %>%
mutate(ageno = ageno + 1)

library(ggthemes)
ggplot(data = newdf4, aes(x = age, y = pop/1000, fill = year)) +
#bars for all but 2100
geom_bar(data = newdf4 %>% filter(sex == "female", year != 2100) %>% arrange(rev(year)),
stat = "identity",
position = "identity", width = 4.5) +
geom_bar(data = newdf4 %>% filter(sex == "male", year != 2100) %>% arrange(rev(year)),
stat = "identity",
position = "identity",
mapping = aes(y = -pop/1000)) +
coord_flip() +
scale_y_continuous(labels = abs, breaks = seq(-600, 600, 250)) +
geom_hline(yintercept = 0) +
theme_economist_white(horizontal = FALSE) +
scale_fill_economist() +
labs(fill = "", x = "", y = "")

Screen Shot 2019-07-14 at 15.46.36