Introduction to Mapping in R

Why Make Maps?

Many of the research questions academics, governments, and companies are interested in are spatial in nature. You want to know who buys your products, of course. But also, do they live near each other? Based on the demographic profiles of your customers, is there a location that may be ripe for business? Maybe you want to know the effect of a policy experiment that some localities have implemented and others have not. Great, which places and where are they? Are they all near a large urban center, which we think may confound our estimates? Perhaps you want to know how to allocate resources to fight crime. Are there some locations that experience crime with more or less intensity? You get the idea.

A picture is worth 1,000 words. If you can make a clear map, you can easily communicate the highly complex variation you care about. I could show you a ton of tables about the relationship between points where property crimes happened in terms of 1. distance to public transportation 2. average income around the location of the crime 3. if it occurred in a retail district…but I could also just show you the map, and if you are familiar with the context, you will be able to distill all of that information yourself.

If you are here you probably don’t need much convincing about the utility of maps, but it is worth reminding ourselves of why they are useful so we can make the most optimal figure to actually achieve the objectives we are setting out to accomplish. If you really want someone to be able to tell how far property crimes are from public transit stations, you will probably want to include the locations of transit stations as a layer on your map. If you want someone to know how clustered customers are, you may want to consider the fact that in a location with a high volume of points will be super hard to read without some additional tailoring.

After reading this post, you should be able to

Understand different types of spatial data, allowing you to load it into your environment and manage it
Execute simple data cleaning tasks that are specific to geographic data, making it easier to create the map you care about
Work with external data resources to create base maps
Create pretty static maps (the special dynamic case will be a subject of a future post)

Maps in R: Types of Data

Before we get to the fun stuff, we have to talk about the different types of spatial data you will encounter when you are starting a maping related project in R. It is easily to not appreciate just how complex spatial data structures are, and if you start downloading data to make maps without being familiar with the different file types you will find yourself more frustrated than you need to be throughout the process.

Vector Versus Raster Data

Vector and Raster data could be its own post. Put simply, it is two different ways to represent data. Rasters are continuous sets of points that contain values which may be extracted. A raster might be a satellite image (a picture of how bright it is at night or a picture showing how green the Earth is as measure of phytomas). Rasters will generally have the file extension .tif.

Vector data may be more familiar to an R user. This is data that stores geographic information according to a naming convention like: Data(Admin_Name, geometry)

where Admin_Name might be the names of places (Alabama, Alaska, Arizona,…) and geometry is a set of points that create one of three shapes:

Points: a coordinate pair \((latitude, longitude)\)
Lines: a set of coordinate pairs \(((latitude_1, longitude_1),...(latitude_k,longitude_k))\) that form a line
Polygons: a set of coordinate pairs \(((latitude_1, longitude_1),...(latitude_k,longitude_k))\) that create a figure with more than one line (note: these do not need to be straight line segments, so when working with spatial data we would call a circle around a point a polygon even though in the technical geometry sense it is not. An annoying technicality).
Multipolygons: a set of polygons that can be attributed to one place or name (for example, say a country also has an island off of its coast that also belongs to the country. The multipolygon would be the country’s mainland borders and its island borders.)

Typically, you will find vector data stored as shape files. These are typically zipped folders that include several different files in them, one of which will have a .shp extension. The file with .shp is the one you would be working with in R. You may also find them as a .csv with a geometry colum, or as a geojson.

Reading Geographic Data

As you might have guessed, our approach to reading geographic data will differ based on the type of data we are working with. For vector data, we will use ‘sf’. Let’s read it in:

library(sf)

## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

There is a reason I allowed the loading message to print. ‘sf_use_s2() is TRUE’ means that R will be using the ‘s2’ package for geocomputation. This matters; it is a default option that will treat data as being on a sphere (since Earth is round) instead of treating it as being projected on a 2D flat space. This has consequences for distance metrics (how far is A from B) and for projection (how the shapes look). There are reasons to switch it off: many datasets have been created using the 2D representation, and it may be advantageous to work with the data the way it was prepared and created, meaning you may not want this option to be live (in which case, you can switch it to FALSE by placing FALSE inside the function after you load sf).

Reading vector data in with ‘sf’ is the same as you would use a base R function like read.csv(). We can simply use the argument ‘read_sf’ and the filepath directory, taking care that our .shp file is contained in a folder that includes other necessary supporting files (when you download a zipped shapefile, don’t just drag the .shp into a Dropbox folder or something like that. Keep the files together!)

For this demo, we are going to use some canned data contained in the package tmap, which we will be using to make our maps.

library(tmap)
data('World')

Now, let’s look at raster data. There is a package called raster that you could use, however, we will be using terra, which is a bit newer and faster.

library(terra)

Getting Started with Basic Maps

There are many mapping packages in R. ggplot supports maps, which is likely familiar to you if you have used R for any data visualization before. There is also cartography, which is a newer package and has some nice features (particularly discontinuity maps). We will be using tmap. It is fast, flexible, and makes it very easy to layer different types of spatial data (raster, vector, and even within vector lines, points, and polygons). This is a huge advantage to us since, as we mentioned at the top of this post, our objective for making a map may force us to include more than just some shading on top of some boxes.

Our Map

We’ll be making a map that I recently created in a working paper of mine with my co-author Melissa Pavlik. The paper is about how anti-grazing laws in Nigeria, which are prohibitions against ruminants in grasslands without permits, shapes conflict levels between herders and farmers. We wanted to create maps to show (1) the distribution of degraded land across Africa, with Nigeria highlighted, to show that the share of quality land for farmers and herders to use is decreasing and (2) a map to show which states in Nigeria have implemented laws, and when they implemented them.

Map 1:

We can make a really simple map of our raster without any real processing. The raster we will be using is of of the Revised Universal Soil Loss Equation (RUSLE) in 2001. RUSLE is a measure of soil erosion. The equation, which we do not need to dive into, is an estimate of soil loss based on several inputs including rainfall, land practices, and topographical features. When RUSLE is higher, more land is degraded.

Let’s make our first map. The structure of tmap is to first specify the shape (the spatial data) that will be used, and then to declare the type (in our case, tm_raster because we are wokring with a raster). Inside of this function we can also play with the legend title (title) and how we want the data visualized (here we seperate it into quantils, but we could also use methods like Jenk’s or Fisher breaks). Here is the map.

rusle_01 = rast('RUSLE_SoilLoss_v1.1_yr2001_25km.tif')
tm_shape(rusle_01) + tm_raster(title = 'RUSLE', style = 'quantile') + 
  tm_legend(position = c('LEFT', 'BOTTOM'))

Recall that what we wanted to show our readers was soil loss in Africa. This map does that, but it is also giving us lots of information we don’t really care about communicating.

We can mask our raster data - essentially subsetting the data based on spatial parameters - to focus on map on the region we care about.

library(dplyr)
africa = World %>%  
  filter(continent == 'Africa')
nigeria = africa %>%  
  filter(name == 'Nigeria')
masked_raster <- mask(rusle_01, vect(africa))

What does the code above do? First, we filter our vector map of the world to countries in the continent of Africa, and then the country of Nigeria. Next, we use the mask function to filter our raster to only be within Africa. The first argument in mask is our raster, and the second argument is the vector data (vect) we use to filter.

Great! We have prepped our data. Now we move on to making the actual map.

tm_shape(africa) + tm_borders(alpha = .1) + 
  tm_shape(masked_raster) + tm_raster(title = 'RUSLE', style = 'quantile') + 
  tm_shape(nigeria) + tm_borders('green', lwd  = 2)

Maps are just like a good baked ziti - layering is important. We lay our first layer down, which is the base map of African countries. Notice we commanded tmap draw the borders here. We could draw polygons instead. If we used tm_polygons, we would get a map with a grey background. That would definitely work, but I like the white background more. Inside tm_borders, we make an argument alpha. alpha controls the the transparency of the border lines we draw. We want the focus of our map to be on what comes next (the actual soil loss), so we set alpha to be low. If someone wanted to eyeball what things looked like in Egypt they could, but we don’t want to draw everyone’s eyes there.

Then we have our second argument. This is the layer with the good stuff - the actual soil data. We include the shape name in tm_shape and the tm_raster() argument is the same as before.

Our final layer draws the readers eyes to the country that we study: Nigeria. We make the line green so it pops in contrast to the orange-red palette for our raster data, the line is thicker than the other borders (lwd=2) and the default alpha is 1, so it is 10 times less transparent than other country borders. This draws the readers attention to the fact that soil erosion is a big problem in Nigeria.

That wraps up our first map. Now let’s look at a second one.

Our Second Map

Time for our second map. We want to show the states that passed grazing bans in Nigeria, and when the laws were passed, so our readers have a sense of the geography of the variable we are interested in. We could do with a simple vector map, but there is a drawback: maybe we want readers to know a bit about the surrounding area of Nigeria, and a vector map cannot do that unless we are willing to pull shapefiles from other neighboring countries. Further, maybe we want to communicate something about land class types (is the area covered in roads? is green space more or less dominant?) and a vector map cannot do that very easily. What should we do?

Maps are always about layering. We will create a base layer from open street map, which will give us a nice backround to paint our map on top of. After this, we will set the transparency of the polygons to something low so the reader can see the background.

First, time to work with our open street maps base layer, which will be a raster. Remember those? The thing is rasters of this type are not like our first example. Here we have a raster with points that represent values. Other rasters may be images which are rendered from values along the color spectrum red, blue, and green. What would happen if we tried to use the tm_raster argument on an object like this?

data("World")
world_osm = rast(basemaps::basemap_geotif(World, map_service = "osm", map_type = "streets"))

## Warning: Transforming 'ext' to Web Mercator (EPSG: 3857), since 'ext' has a
## different CRS. The CRS of the returned basemap will be Web Mercator, which is
## the default CRS used by the supported tile services.

## Loading basemap 'streets' from map service 'osm'...

tm_shape(world_osm) + tm_raster()

What…is that? Well, we are getting this facet map because our raster ‘world_osm’ contains stacks of information (values for red, values for blue, values for green) that when placed together render an image, but individually are simply plotted as three distinct intensities. tmap has a function that will properly process this information to render a sensical graphic for us.

This is much better!

tm_shape(world_osm) + tm_rgb()

Some of inferring which argument you should use for a raster is trial and error, but there are some logical guideposts. Is your data communicating something continuous? Then you probably want tm_raster. Is the data communicating something discrete, like the type of land across a continent? Well then you are probably going to want tm_rbg. Using that as your prior, you should be able to pretty easily make a map once you get your raster, and if it looks like the strange one we plotted above when you use a tm_raster argument, well, you likely will know the cause!

Putting that issue aside, let’s get back to our second map. In this map, we wanted to show which states passed bans on grazing.

base_map_nigeria = basemaps::basemap_geotif(nigeria, map_service = "osm", map_type = "streets")

## Warning: Transforming 'ext' to Web Mercator (EPSG: 3857), since 'ext' has a
## different CRS. The CRS of the returned basemap will be Web Mercator, which is
## the default CRS used by the supported tile services.

## Loading basemap 'streets' from map service 'osm'...

base_map_nigeria = rast(base_map_nigeria)

state <- readRDS("state.rds") %>%  
  mutate(cohort = as.factor(cohort))

tm_shape(base_map_nigeria) + tm_rgb() + 
  tm_shape(state) + tm_fill('cohort', alpha = .5, title = 'Year of AGL') + 
  tm_layout(legend.title.size = 1,
            legend.text.size = 1,
            legend.position = c("right","bottom"),
            legend.bg.color = "white")