The Spatial side of Demography at EPC…

Come meet me at EPC to discuss Spatial Demography!

Thursday, 7 June 2018, 9:00-10:30 Session 1 Motherhood, Labor Market and Wages: Contextual Determinants of Childbearing in Spain: A Spatial Panel Study

Friday, 8 June 2018, 11:00-12:30 Session 65 Fertility Trends and ProspectsThe Geography of Fertility Rates in Low and Middle-Income Countries: Analysis of Cross-Sectional Surveys from 74 Countries

Global and Local Measures of Spatial Autocorrelation

This post aims at being a summary of the available techniques to investigate spatial autocorrelation for the social sciences, rather than presenting the theory behind spatial autocorrelation. For that there are great books available on line, like Anselin’s, Le Page, and Bivand just to cite a few.

The techniques presented here work for a spatial polygons data frame. The difference between spatial points and polygons data frames is not that big, the idea is the same and most of what I am doing here can be applied to data points.

Why do we look at spatial autocorrelation at all? Spatial autocorrelation leads to biased results in regressions, this is the reason why we want to compute Moran’s I and why we include spatial autocorrelation if its measurement proves to be significative and non-random.

Spatial autocorrelation can be investigated globally or locally. “Globally”, implies that the measure you’re going to obtain refers to the dataset as a whole, whether it is a whole country, continent or region. “Locally”, means that you are taking into consideration each and every polygon and getting a measure for each one of them.

We start by uploading the data, projecting them (important when considering the distance based measures -earth is not flat, whatever they may say!), and construct neighborhood relations (in this case Queen and Rook, but could be any other). For more detail see this post on how to construct neighbor relations.

NC= readShapePoly(system.file("shapes/sids.shp", package="maptools")[1], IDvar="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))
nb.FOQ = poly2nb(NC, queen=TRUE, row.names=NC$FIPSNO)
nb.RK = poly2nb(NC, queen=FALSE, row.names=NC$FIPSNO)

Global measures of spatial autocorrelation

There are two main measures of global spatial autocorrelation: Moran’s I and Geary’s C. Moran’s I is the most used in my experience, but both work are perfectly acceptable.
Moran’s I ranges between -1 (strong negative spatial autocorrelation with a dispersed pattern) and 1 (strong positive spatial autocorrelation with a clustered pattern) with 0 being the absence of spatial autocorrelation.
Geary’s C ranges between 0 and 2, with positive spatial autocorrelation ranging from 0 to 1 and negative spatial autocorrelation between 1 and 2.
Of course being these inferential measures, if the p-value is non significant we cannot exclude that the patterns could be random(!)

In this case Moran’s I is positive and significant, the z-score (not provided by moran.test) is positive implying spatial clusters, so we can reject the null hypothesis.

nwb <- NC$NWBIR74
moran.test(nwb, listw = nb2listw(nb.RK))
##  Moran I test under randomisation
## data:  nwb
## weights: nb2listw(nb.RK)
## Moran I statistic standard deviate = 3.0787, p-value = 0.001039
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance
##       0.185965551      -0.010101010       0.004055701
geary.test(nwb, listw = nb2listw(nb.RK))
##  Geary C test under randomisation
## data:  nwb
## weights: nb2listw(nb.RK)
## Geary C statistic standard deviate = 2.0324, p-value = 0.02106
## alternative hypothesis: Expectation greater than statistic
## sample estimates:
## Geary C statistic       Expectation          Variance
##        0.83274888        1.00000000        0.00677185

if you have polygons with no neighbors remember to specify zero.policy=NULL

moran.plot(nwb, listw = nb2listw(nb.RK))

moran plot, listw = nb2listw(nb.RK), nsim=100)
##  Monte-Carlo simulation of Moran I
## data:  nwb
## weights: nb2listw(nb.RK)
## number of simulations + 1: 101
## statistic = 0.18597, observed rank = 100, p-value = 0.009901
## alternative hypothesis: greater

permutation Moran_s I

plot(, listw = nb2listw(nb.RK), nsim=100))

Same thing can be done for Geary’s C:, listw = nb2listw(nb.RK), nsim=100)
##  Monte-Carlo simulation of Geary C
## data:  nwb
## weights: nb2listw(nb.RK)
## number of simulations + 1: 101
## statistic = 0.83275, observed rank = 5, p-value = 0.0495
## alternative hypothesis: greater
plot(, listw = nb2listw(nb.RK), nsim=100))

Local measures of spatial autocorrelation

Local Moran and Local G

locm <- localmoran(nwb, listw = nb2listw(nb.RK))
locG <- localG(nwb, listw = nb2listw(nb.RK))

Get the neighbor matrix into a listwise format with listw: there’s two options here, row-standardized weights matrix style = "W" creates proportional weights when polygons have an unequal number of neighbors, balancing out observations with few neighbors. Binary weights style = "B" upweight observations with many neighbors.

myvar <- NC$NWBIR74
nb <- nb.RK
# Define weight style
ws <- c("W")

# Define significance for the maps
significance <- 0.05
plot.only.significant <- TRUE

# Transform the neigh mtx into a listwise object
listw <- nb2listw(nb, style=ws)

# Create the lagged variable
lagvar <- lag.listw(listw, myvar)

# get the mean of each
m.myvar <- mean(myvar)
m.lagvar <- mean(lagvar)

The next step is to derive the quadrants and set the coloring scheme. I like to color the border of each polygon with the color of their local moran score, regardless of their pvalue, and then fill only the significant ones.

n <- length(NC)
vec <- c(1:n)
vec <- ifelse(locm[,5] < significance, 1,0)

# Derive quadrants
q <- c(1:n) for (i in 1:n) {   if (myvar[[i]]>=m.myvar & lagvar[[i]]>=m.lagvar) q[i] <- 1
  if (myvar[[i]]<m.myvar & lagvar[[i]]<m.lagvar) q[i] <- 2
  if (myvar[[i]]<m.myvar & lagvar[[i]]>=m.lagvar) q[i] <- 3   if (myvar[[i]]>=m.myvar & lagvar[[i]]<m.lagvar) q[i] <- 4

# set coloring scheme
q.all <- q
colors <- c(1:n)
for (i in 1:n) {
  if (q.all[i]==1) colors[i] <- "red"
  if (q.all[i]==2) colors[i] <- "blue"
  if (q.all[i]==3) colors[i] <- "lightblue"
  if (q.all[i]==4) colors[i] <- "pink"
  if (q.all[i]==0) colors[i] <- "white"   if (q.all[i]>4) colors[i] <- "white"

# Mark all non-significant regions white
locm.dt <- q*vec
colors1 <- colors
for (i in 1:n)
  if ( !( (locm.dt[i])) )  {
  if (locm.dt[i]==0) colors1[i] <- "white"
colors2 <- colors
colors2 <- paste(colors2,vec)
pos = list()
for (i in 1:n) {
  pos[[i]] <- c(which(NC$NWBIR74==colors2["blue 0"]))

blue0 <- which(colors2=="blue 0")
red0 <- which(colors2=="red 0")
lightblue0 <- which(colors2=="lightblue 0")
pink0 <- which(colors2=="pink 0")
lb <- 6
labels=c("High-High", "High-Low", "Low-High", "Low-Low")

# plot the map
if (plot.only.significant==TRUE) plot(NC, col=colors1,border=F) else
  plot(NC, col=colors,border=F)
legend("bottomleft", legend = labels, fill = c("red", "pink", "lightblue", "blue"), bty = "n")

lisa map
Local G gives back z-scores values and indicate the posibility of a local cluster of high values of the variable being analysed, very low values indicate a similar cluster of low values.


nclassint <- 3
colpal <- brewer.pal(nclassint,"PiYG")
cat <- classIntervals(locG, nclassint, style = "jenks", na.ignore=T)
color.z <- findColours(cat, colpal)

plot(NC, col= color.z, border=T)


# color only significant polygons
plot(NC, border=T)
plot(NC[which(vec==1),], col=color.z[which(vec==1)], border=T, add=T)

lisa map2

Moran plots in ggplot2

Moran plots are one of the many way to depict spatial autocorrelation:
where “varofint” is the variable we are studying, “listw” a listwise neighbourhood matrix, and the function “moran.test” performs the Moran’s test (duh!) for spatial autocorrelation and is included in the spdep funtionality. The same plot can be done using ggplo2 library. Provided that we already have our listwise matrix of neighborhood relationships listw, we first define the variable and the lagged variable under study, computing their mean and saving them into a data frame (there are a lot of datasets you can find implemented in R: afcon, columbus, syracuse, just to cite a few). The purpose is to obtain something that looks like this (I have used my own *large* set of Spanish data to obtain it):


Upload your data. Here is Anselin (1995) data on African conflicts, afcon:

varofint listw varlag <- "Total Conflicts"
m.varofint m.varlag
and compute the local Moran's statistic using localmoran:

and save everything into a dataframe:

use these variables to derive the four sectors "High-High"(red), "Low-Low"(blue), "Low-High"(lightblue), "High-Low"(pink):
df$sector significance vec =df$m.varofint & df$varlag>=df$m.varlag]  df$sector[df$varofint<df$m.varofint & df$varlag<df$m.varlag]  df$sector[df$varofint<df$m.varofint & df$varlag>=df$m.varlag]  =df$m.varofint & df$varlag<df$m.varlag]


df$sector.col[df$] <- "red"
df$sector.col[df$] <- "blue"
df$sector.col[df$] <- "lightblue"
df$sector.col[df$] <- "pink"
df$sector.col[df$] <- "white"

df$sizevar df$sizevar 0.1)
to get the ggplot graph:
p 0.05", "High-High", "Low-Low","Low-High","High-Low"))+
axis.text= element_text(colour="black", size=20, angle=0,face = "plain"),
axis.text.x  = element_text(hjust=.5, vjust=.5),
axis.text.y  = element_text(hjust=1, vjust=1),
strip.text.x  = element_text(size = 20, colour ="black", angle = 0),
plot.title= element_text(size=20))+
stat_smooth(method="lm",se=F,colour="black", size=1)+
theme(legend.background =element_rect("white"))+
legend.text =element_text(size=20))

Check out the interactive shiny version on pracademic