## Game of Thrones maps in R…

The map of GOT world with rivers, roads, lakes, the Wall, and main cities:

Neighborhood relations according to Sphere of Influence pretty much coincide with roads and rivers (package spdep):

Paste some images to locate the (surviving) Stark family members, using rasterImage from the png library:

## Creating neighborhood matrices for Spatial Polygons in R (updated)

One of the first steps in spatial analysis is to create a neighborhood matrix, that is to say create a relationship/connection between each and (ideally!) every polygon. Why? Well, given that the premise for spatial analysis is that neighboring locations are more similar than far away locations, we need to define what is “near”, a set of neighbors for each location capturing such dependence.

There are many ways to define neighbors, and usually, they are not interchangeable, meaning that one neighborhood definition will capture spatial autocorrelation differently from another.

In R the package spdep allows to create a neighbor matrix according to a wide range of definitions: contiguity, radial distance, graph based, and triangulation (and more). There are 3 main and most used neighbors:

A) Contiguity based of order 1 or higher (most used in social sciences)

B) Distance based

C) Graph based

Install and load the maptools and spdep libraries shapefile from North Carolina counties:

```library(maptools)
library(spdep)
NC= readShapePoly(system.file("shapes/sids.shp", package="maptools")[1], IDvar="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))
```

## A. Contiguity based relations

are the most used in the presence of irregular polygons with varying shape and surface, since contiguity ignores distance and focuses instead on the location of an area. The function poly2nb allows to create 2 types of contiguity based relations:

### 1. First Order Queen Contiguity

FOQ contiguity defines a neighbor when at least one point on the boundary of one polygon is shared with at least one point of its neighbor (common border or corner);

```nb.FOQ = poly2nb(NC, queen=TRUE, row.names=NC\$FIPSNO)
#row.names refers to the unique names of each polygon
nb.FOQ
```
```## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 490
## Percentage nonzero weights: 4.9
## Average number of links: 4.9
```

Calling nb.FOQ you get a summary of the neighbor matrix, including the total number of areas/counties, and average number of links.

### 2. First Order Rook Contiguity

FOR contiguity does not include corners, only borders, thus comprising only polygons sharing more than one boundary point;

```nb.RK = poly2nb(NC, queen=FALSE, row.names=NC\$FIPSNO)
nb.RK
```
```## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 462
## Percentage nonzero weights: 4.62
## Average number of links: 4.62
```

NB: if there is a region without any link, there will be a message like this:
Neighbour list object:
Number of regions: 910
Percentage nonzero weights: 0.5924405
1014 3507 3801 8245 9018 10037 22125 30005 390299 390399

where you can identify the regions with no links (1014, 3507,…) using which(…), and in R it is possible to “manually” connect them or change the neighbor matrix so that they can be included (such as graph or distance based neighbors).
Sometimes, it also happens that some polygons that have been retouched (sounds like a blasphemy but it happens a lot with historical maps) may not recognize shared borders. This is when manually setting up neighbors comes in handy (you can’t do that in Geoda).

Higher order neighbors are useful when looking at the effect of lags on spatial autocorrelation and in spatial autoregressive models like SAR with a more global spatial autocorrelation:

```nb.SRC = nblag(nb.RK,2) #second order rook contiguity
nb.SRC
```
```## [[1]]
## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 490
## Percentage nonzero weights: 4.9
## Average number of links: 4.9
##
## [[2]]
## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 868
## Percentage nonzero weights: 8.68
## Average number of links: 8.68
##
## attr(,"call")
## nblag(neighbours = nb.RK, maxlag = 2)
```

## B. Distance based neighbors

DBN defines a set of connections between polygons either based on a (1) defined Euclidean distance between centroids dnearneigh or a certain (2) number of neighbors knn2nb (e.g. 5 nearest neighbors);

```coordNC = coordinates(NC) #get centroids coordinates
d05m = dnearneigh(coordNC, 0, 0.5, row.names=NC\$FIPSNO)
nb.5NN = knn2nb(knearneigh(coordNC,k=5),row.names=NC\$FIPSNO) #set the number of neighbors (here 5)
d05m
```
```## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 430
## Percentage nonzero weights: 4.3
## Average number of links: 4.3
```
```nb.5NN
```
```## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 500
## Percentage nonzero weights: 5
## Average number of links: 5
## Non-symmetric neighbours list
```

a little trick: if you want information on neighbor distances whatever the type of neighborhood may be:

```distance = unlist(nbdists(nb.5NN, coordNC))
distance
```
```##   [1] 0.3613728 0.3693554 0.3864847 0.2766561 0.5168459 0.3709748 0.2607982
##   [8] 0.3232974 0.4376632 0.2862144 0.5773310 0.3778483 0.4463538 0.2914539
## ...
## [498] 0.3407192 0.3995114 0.1838115
```

## C. Graph based (I’ve never used them, but it’s good to know that they exist)

Delauney triangulation tri2nb constructs neighbors through Voronoi triangles such that each centroid is a triangle node. As a consequence, DT ensures that every polygon has a neighbor, even in presence of islands. The “problem” with this specification is that it treats our area of study as if it were an island itself, without any neighbors (as if North Carolina were an island with no Virginia or South Carolina)… Therefore, distant points that would not be neighbors (such as Cherokee and Brunswick counties) become such;
Gabriel Graph gabrielneigh is a particular case of the DT, where a and b are two neighboring points/centroids if in the circles passing by a and b with diameter ab does not lie any other point/centroid;
Sphere of Influence soi.graph: twopoints a and b are SOI neighbors if the circles centered on a and b, of radius equal to the a and b nearest neighbour distances, intersect twice. It is a sort of Delauney triangulation without the longest connections;
Relative Neighbors relativeneigh is a particular case of GG. A border belongs to RN if the intersection formed by the two circles centered in a and b with radius ab does not contain any other point.

```delTrinb = tri2nb(coordNC, row.names=NC\$FIPSNO) #delauney triangulation
summary(delTrinb)
```
```## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 574
## Percentage nonzero weights: 5.74
## Average number of links: 5.74
##
##  2  3  4  5  6  7  8  9 10
##  1  2 13 29 27 22  3  1  2
## 1 least connected region:
## 2 most connected regions:
## 37005 37179 with 10 links
```
```GGnb = graph2nb(gabrielneigh(coordNC), row.names=NC\$FIPSNO) #gabriel graph
summary(GGnb)
```
```## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 204
## Percentage nonzero weights: 2.04
## Average number of links: 2.04
## 20 regions with no links:
## 37109 37131 37137 37141 37145 37147 37151 37159 37161 37165 37173 37175 37179 37183 37185 37187 37189 37195 37197 37199
## Non-symmetric neighbours list
##
##  0  1  2  3  4  5  6  7
## 20 27 16 15 13  7  1  1
## 27 least connected regions:
## 37047 37053 37055 37075 37091 37105 37107 37113 37115 37117 37119 37121 37129 37133 37135 37139 37143 37149 37153 37155 37157 37163 37167 37177 37181 37191 37193 with 1 link
## 1 most connected region:
```
```SOInb = graph2nb(soi.graph(delTrinb, coordNC), row.names=NC\$FIPSNO) #sphere of influence
summary(SOInb)
```
```## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 470
## Percentage nonzero weights: 4.7
## Average number of links: 4.7
##
##  1  2  3  4  5  6  7  9
##  1  5 12 26 30 15 10  1
## 1 least connected region:
## 1 most connected region:
```
```RNnb = graph2nb(relativeneigh(coordNC), row.names=NC\$FIPSNO) #relative graph
summary(RNnb)
```
```## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 133
## Percentage nonzero weights: 1.33
## Average number of links: 1.33
## 31 regions with no links:
## 37047 37053 37097 37107 37109 37115 37131 37137 37141 37143 37145 37147 37151 37155 37159 37161 37163 37165 37167 37173 37175 37179 37183 37185 37187 37189 37191 37193 37195 37197 37199
## Non-symmetric neighbours list
##
##  0  1  2  3  4
## 31 30 18 17  4
## 30 least connected regions:
## 37009 37027 37031 37035 37037 37039 37055 37073 37075 37083 37091 37095 37105 37113 37117 37119 37121 37125 37127 37129 37133 37135 37139 37149 37153 37157 37169 37171 37177 37181 with 1 link
## 4 most connected regions:
## 37001 37003 37059 37079 with 4 links
```

#### What to do with all this stuff? …

compute and compare global Moran’s I
LISA maps
Variograms and correlograms
…?

## Coloring maps in R for social sciences

“It’s all about matching perceptual dimensions with data dimensions” Cindy Brewer

Plotting a map brings about two issues: colors and scale, as they both have to work together to best describe your data or else failure is on the line.

To have a better perception of how colors look on a map, I find very useful this website Color Brewer  by Cindy Brewer  (read a recent interview here), which gives advice on map colors, hues and scales for various backgrounds and contexts.

R makes it easy to choose a palette thanks to RColorBrewer library, so that you don’t have to create one by yourself (you can see the available combinations by calling display.brewer.all() function):

• Sequential: continuous variable with data ranging from relatively low to relatively high or interesting values;
• Qualitative:categorical variables with no specific ordering;
• Diverging: continuous variable for data where both large low and high values are interesting, or a scale comprising negative to positive values. Usually cold colors denote low or negative (-) values while warm colors (red, orange) denote high or positive (+) values. Also, the mid-point should mean something and add information to your message (e.g. national average);

The choice of colors is trickier, depending on the subject mapping, the message to convey and the context:

• Rainbow palette: think again;
• Red means “look at me”, so use it to highlight something meaningful;

• Similarly use bright or dark colors to highlight important information, contrasting it to softer/pale tones;
• Prefer a single hue palette if possible;
• Be aware that specific colors may have specific cultural meanings (!), a few examples:
1. Red: in South Africa it’s the color of mourning;
2. Orange: color of Protestants in Ireland;
3. Yellow: color of mourning in Egypt and a positive color in Asia;
4. Purple: color of mourning in Thailand.

for further information see this comprehensive graph of colors in culture. A good rule is also not to choose colors in order to give a good/bad message (green vs red or blue vs red), unless you are mapping the number of drowned kittens.

Once solved the color issue, which I’d like to stress should be weighted together with the choice of a specific palette, we can choose a number of class intervals, that is to say how many colors we are going to use. If I use a diverging palette for continuous variables,  I prefer to have an odd number of colors, either 5 or 7 (but this fits my specific mapping requirements), so that the mid value has neutral light tones. Also, ideally breaks should mean something and not be arbitrarily chosen. R once again has a solution for this, the classInt library, which provides a set of styles to choose from for continuous numerical variables (sd, equal, pretty, quantile, kmeans, hclust, bclust, fisher, or jenks), as well as the option to set them manually (fixed).

• equal: equal distance, ideally for data with a normal distribution;
• quantile: quantiles are good for data with a skewed distribution;
• jenks/fisher: my personal favorites, it tries to reduce the variance within classes and maximize the variance between classes;

```n <- 5 # how many colors? variable # my variable of choice category <- classIntervals(variable, n,style = "jenks",na.ignore=T) palette <- brewer.pal(n,"RdBu") color <- findColours(category,(palette)) bins <- category\$brks lb <- length(bins) plot(spain, col=color,border=T) legend("topright",fill=palette,legend=(paste(round(bins[-length(bins)],1),"-",round(bins[-1],1))),cex=2, bg="white") ```

Of course we can edit pretty much everything to tailor the map to our needs and preferences. For instance the above map portrays 910 areas and I prefer to suppress borders to avoid overcrowding by setting `plot(spain, col=color,border=F)` and using the `layout` function to separate the plot from legends to get something like this:

`layout(matrix(c(1,2,3),1,3,byrow=T), widths=c(1,1,0.35), heights=1)`