Abandon the rainbows?

Are rainbow palettes really that bad?
If it depended on me, my ideal color palettes would be the color of my favorite toy as a kid: fluo squishy slime, the kind that lights up in the dark. However, not everyone appreciates it, especially scientific journals.
In the end, the presence of colors should be motivated by what’s being represented: a cluster of lines depicting the TFR trends in every country in the world during the past century would not benefit from a 250 colors scale. However, it’s different for maps and heat-maps as the right palette can improve the message, condensing and directing information (see this Lancet paper ).
Rainbow palettes work however rather well (in my opinion) in some instances although they can easily be substituted with less catchy and more printer friendly colors.

I’ll always be a fan of bright colors but I see the point for plotting minimalism. I’ve really enjoyed this article: End of the Rainbow? New Map Scale is More Readable by People Who Are Color Blind

I have downloaded the json file from here and transformed it into a dataframe using the rjson package.
I have used a bunch of color palettes to compare results. Ever since, discovering the viridis palettes, I am a huge fan of the ‘magma’ and ‘inferno’ as their darkest color is a deep black and it’s easier to highlight everything else.


Red-yellow-green palette:

Screen Shot 2018-08-14 at 09.48.02

Red-purple palette:

Screen Shot 2018-08-14 at 09.48.13

Rainbow palette:

Screen Shot 2018-08-14 at 09.48.21.png

Magma palette:

Screen Shot 2018-08-14 at 09.50.51.png
Inferno palette:

Screen Shot 2018-08-14 at 09.51.14
Plasma palette:

Screen Shot 2018-08-14 at 09.51.45
Viridis palette:

Screen Shot 2018-08-14 at 09.52.04.png

Cividis palette (from cividis library here):
Screen Shot 2018-08-14 at 09.52.45.png

Greys palette:

Screen Shot 2018-08-14 at 09.53.06

Inverted grey palette:
Screen Shot 2018-08-14 at 09.53.14


Here’s the code for the cividis palette plot:

ggplot(dt, aes(order, year))+ geom_tile(aes(fill = temp)) + scale_fill_cividis(na.value = "transparent")+ scale_y_reverse(name='', breaks = c(1876, 1900, 1950, 2000, 2018), labels=c('1876', '1900', '1950', '2000', '2018'))+ scale_x_continuous(name='', breaks = c(30-15, 61-15, 92-15, 122-15), labels=c('June', 'July', 'August', 'September'))+ theme(axis.ticks.x = element_blank())+ geom_vline(xintercept=c(30, 61, 92), linetype = "longdash" ) 

Arranging ggplot2 graphs on a page

How to arrange graphs in ggplot2 without the help of the layout matrix

How do you arrange non-simmetric plots in ggplot2?
With the print command:

After installing these two packages: install.packages(“grid”, “ggplot2”), load the  libraries:

The data and code for the three graphs is taken from this website:

# create factors with value labels
mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5), labels=c("3gears","4gears","5gears"))
mtcars$am <- factor(mtcars$am,levels=c(0,1), labels=c("Automatic","Manual"))
mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8), labels=c("4cyl","6cyl","8cyl"))

# Kernel density plots for mpg
# grouped by number of gears (indicated by color)
a <- qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5),
main="Distribution of Gas Milage", xlab="Miles Per Gallon",

# Scatterplot of mpg vs. hp for each combination of gears and cylinders
# in each facet, transmittion type is represented by shape and color
b <- qplot(hp, mpg, data=mtcars, shape=am, color=am,
facets=gear~cyl, size=I(3),
xlab="Horsepower", ylab="Miles per Gallon")

c <- qplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"),
fill=gear, main="Mileage by Gear Number",
xlab="", ylab="Miles per Gallon")

a, b, and c are our graphs. Here we decide how to place the plots on the plotting surface:

grid.newpage() # Open a new page on grid device
pushViewport(viewport(layout = grid.layout(3, 1))) #this can really be anything... just remember to change accordingly the print commands below
print(a, vp = viewport(layout.pos.row = 1, layout.pos.col = 1:1))
print(b, vp = viewport(layout.pos.row = 2, layout.pos.col = 1:1))
print(c, vp = viewport(layout.pos.row = 3, layout.pos.col = 1:1))

The layout=grid.layout is the command dividing the plotting surface, in the example I have divided it into three rows and one column, hence the layout.pos.row = 1, 2, 3 and the layout.pos.row = 1:1 equal for all three plots.


What if I need something asymmetrical? For instance two small plots on one column and one taking up more space… The reasoning is very similar to that of the layout matrix: divide the space into 4 squares grid.layout(2, 2) and then plot the third graph over two rows layout.pos.row=1:2

grid.newpage() # Open a new page on grid device
pushViewport(viewport(layout = grid.layout(2, 2))) #this can really be anything... just remember to change accordingly the print commands below
print(a, vp = viewport(layout.pos.row = 1, layout.pos.col = 1:1))
print(b, vp = viewport(layout.pos.row = 2, layout.pos.col = 1:1))
print(c, vp = viewport(layout.pos.row = 1:2, layout.pos.col = 2:2))


A view of Spanish fertility by age groups (with the help of log scales)

I have been working a lot with the demography library in R, it is a great teaching tool for demography, modeling, life tables, graphic visualization of demographic data, and for many other things (see demography ).
There are a lot of examples available using data from the Human Fertility and Mortality Database.
Here I am using data that I have obtained from Spanish Statistics, a fertility rates time series consisting of 5 years age groups (available from download from here).
It is very nice to plot fertility rates by age groups as one can appreciate the changes in fertility occurred over time (in terms of quantum) and how much each age group contributes to fertility. In the case of Spain,.



The very same plot can be obtained through ggplot2 library (given an appropriate theme (see ggplot themes):

ggplot(ddfert, aes(Year, Female, group= Age,col= Age))+
scale_color_manual(values= c("red", "yellow", "lightgreen", "green","lightblue", "blue", "violet"))+
scale_x_continuous(labels = c(1975, 1985, 1995, 2005, 2015))+
scale_y_continuous("Fertility Rate")


I find it often interesting to plot using a log scale, so that small values don’t get compressed to the end of the graph. In this case it would be sufficient to add to the demography code:
plot(spain, plot.type="time", xlab= "Year", lwd=2, transform=T)...
and to ggplot :
ggplot(ddfert, aes(Year, log(Female), group= Age,col= Age))+...


The gap between desired and observed fertility in Europe. Part 2: Childlessness levels.

To better understand the effect of postponement we tried to measure it by calculating the effect of time spent on contraception while in a union by women who want to have children, a ‘conscious’ way to postpone childbearing.

Involuntary childlessness has gained momentum in mainstream media, which attribute a large part (if not the totality) of the blame on the postponement of childbearing: women wait too long to have children, they don’t hear their biological clock ticking and bam! no children. Ever.

Delaying childbearing to later ages has undoubtedly a repercussion on the biological ability to have children, but it is hardly a simple component of the total effect. What the mainstream discussion is often missing on is that the great majority of children are conceived in unions, hence it is a couple’s decision to have children. Indeed, being single is an important if not pivotal deterrent to motherhood, usually delayed until union formation.

This is why it is important to consider factors such as union dissolution risk to appreciate the variation in involuntary childlessness. To better understand the effect of postponement we tried to measure it by calculating the effect of time spent on contraception while in a union by women who want to have children, a ‘conscious’ way to postpone childbearing.

This is a preview of average population childlessness obtained through simulation using 3 variables: celibacy (%of women ending up single and never entering a union), divorce (%women previously in a union but currently without a partner), and waiting time, the average time spent on contraception at the beginning of a union by a woman who wishes to have children.


>ggplot(dt, aes( Age, value, linetype=Variable, col=Variable))+
> geom_line( size=1) +
> scale_color_manual( values=c( "black", "#666666", "grey","black", "#666666", "grey"), guide=guide_legend( nrow=3, byrow=F, title =  "Childlessness" )) +
> xlab("")+
>scale_linetype_manual( values=c("solid", "solid",  "solid", "twodash", "dotted", "dashed"), guide=guide_legend( nrow=3, byrow= F, title =  "Childlessness" ))+
>theme( plot.margin= unit(c(1,4,1,1), "cm"), legend.position="bottom", legend.direction= "vertical")

1. ggplot(dt, aes( Age, value, linetype= Variable, col=Variable))

linetype= Variable and col=Variable set in the aes tell ggplot to automatically divide the lines based on the number of Variable(s);

2. scale_color_manual sets the colors of the lines contained in values. I was not satisfied with what I got with scale_color_grey so I set my colors manually (_manual!);

3. since I want the legend at the bottom AND in two columns (or 3 rows) AND I have two features specified in the aes I need to add a guide=guide_legend(nrow=3) to each scale_blablabla_manual (that is to say scale_color_manual AND scale_linetype_manual);

4. In guide=guide_legend the byrow=F means that I do not want the legend to appear ordered by row, but rather by columns;

5. in theme( legend.position=”bottom”) tells ggplot to put the legend below the graph and legend.direction to plot it in a vertical way (which I divide in 3 rows)

1887 crude mortality rate in Spain using classInt package

TBM_1887 jenks
Crude Mortality Rate in Spain, 1887 Census

TBM_1887 quantile TBM_1887 bclust TBM_1887 fisher

>nclassint <- 5 #number of colors to be used in the palette
>cat <- classIntervals(dt$TBM, nclassint,style = "jenks")
>colpal <- brewer.pal(nclassint,"Reds")
>color <- findColours(cat,colpal) #sequential
>bins <- cat$brks
>lb <- length(bins)

style: jenks
[20.3,25.9] (25.9,30.5] (30.5,34.4] (34.4,38.4] (38.4,58.2]
68         114         130         115          35

Save the categories into a data.frame (dat)

type first second third fourth fifth
1 quantile    91     93    92     91    95
2       sd    10    202   244      5     0
3    equal   100    246   113      2     1
4   kmeans    68    115   142    118    19
5    jenks    68    114   130    115    35
6   hclust   100    174   153     34     1
7   bclust    53    120   275     13     1
8   fisher    68    114   130    115    35

and melt it into a long format (required by ggplot):

dat1 <- melt(dat,id.vars=c("type"),value.name="n.breaks")

geom_bar(stat="identity", position=position_dodge())


Quick way to add annotations to your ggplot graphs

Lately, some of the graphs I have been working on have “strange/erratic” values, so I thought to plot those values in a different color and rather than adding an extra legend line I have decided to add a note to explain the difference. Among the many options available, I have found a very quick and harmless way to add annotations to ggplot graphs. It uses the library “gridExtra”, which employs user-level functions that work with “grid” graphics and draw tables.

1.load ggplot2 library
2. and save your graph as “my_graph”:
my_graph<- qplot(wt, mpg, data = mtcars)
3. load gridExtra and add the text to the graph. Note that x, hjust and vjust give the position of the text in the outer margins. If you want to annotate INSIDE the graph, use annotate:
g <- arrangeGrob(p, sub = textGrob("I pledge my life and honor to the Night's Watch, \nfor this night and all the nights to come.", x=0, hjust=-0.1, vjust=0.1,gp = gpar(fontface = "italic", fontsize = 10)))

5. save the graph
ggsave("my_graph_with_note.pdf", g, width=5,height=5)







here is the graph I have been working on:


Valar Morghulis: Some charts using GOT (tv-show) deaths

Drawing from one of the most important demographic laws, Valar Morghulis (all men must die), here is a simple summary of the deadly happenings in four seasons of GOT as reported by the Washington Post.

Let’s start by the total number of (portrayed) deaths by season:

df1 ggplot(df1,aes(x=factor(Series),y=Total))+
xlab("Season number")+
ylab("Total number of deaths")

Number of deaths by season box-plot
Number of deaths by season

xlab("Season number")+
ylab("Total number of deaths")

Number of deaths by season

by location in Westeros:

df2 Location=c("King's Landing","Beyond the Wall","Castle Black","The Twins","The Riverlands")
ylab("Total number of deaths")+

Number of deaths by location

by method of death:
df3 Method=c("Animal","Animal Death","Arrows","Axe","Blade","Bludgeon","Crushing","Falling","Fire","Hands","HH item","Mace","Magic","Other","Poison","Spear","Unknown")
df3.1 df3.2 ggplot(df3.2,aes(x=factor(Method),y=value,fill=variable))+
scale_fill_discrete(name ="Method of Death", labels=c("Season 1", "Season 2", "Season 3", "Season 4"))

Number of deaths by method
and lastly by House allegiance:
df4 House df4.1 df4.2 ggplot(df4.2,aes(x=reorder(factor(House),value),y=value,fill=variable))+
scale_fill_discrete(name ="House Allegiance", labels=c("Season 1", "Season 2", "Season 3", "Season 4"))+

Number of deaths by house