Making maps in R

I’ve had a lot of people ask me to make maps for them recently. There seems to be a magical aura around cartography where people say “I can’t make maps” or “I don’t do GIS”. It’s really not that hard. IF you don’t want to pay the big $$ for an ArcGIS licesne, you can make a quite servicable map with R.

There are a lot of different packages that you can use for map making, so one of the bigest issues folks have when they are first starting is knowing which package to start with.

‘sf’ which stands for “simple features” is one of the most user-friendly packages for basic map making (using vector graphics). It was built by the team that brought you the tidyvserse, so you know it’s probably good.

You can check out the website for more info: https://r-spatial.github.io/sf/

A little background on GIS

In a normal data set, you will have a table where each observation is a row, and each attribute associated with that observation is a column.

We’ll use stations from EMP’s discrete water quality survey as an example.

EMP = read.csv("EMP_Discrete_Water_Quality_Stations.csv")
EMP
##    Station                                           Location StationType
## 1      C10                  San Joaquin River near Vernalis   Shore-based
## 2     C10A         San Joaquin River near Vernalis @ SJR Club Shore-based
## 3       C3                 Sacramento River @ Green's Landing Shore-based
## 4      C3A                            Sacramento River @ Hood Shore-based
## 5       C7 San Joaquin River from Boat Dock @ Mossdale Bridge Shore-based
## 6       C9                  West Canal @ Clifton Court Intake Shore-based
## 7      D10                   Sacramento River @ Chipps Island Mid-channel
## 8      D11                          Sherman Lake near Antioch Mid-channel
## 9      D12           San Joaquin River @ Antioch Ship Channel Mid-channel
## 10    D14A                              Big Break near Oakley Mid-channel
## 11     D15                   San Joaquin River @ Jersey Point Mid-channel
## 12     D16               San Joaquin River @ Twitchell Island Mid-channel
## 13     D19                 Frank's Tract near Russo's Landing Mid-channel
## 14      D2                      Suisun Bay near Preston Point Mid-channel
## 15     D22                         Sacramento River @ Emmaton Mid-channel
## 16     D24            Sacramento River below Rio Vista Bridge Shore-based
## 17     D26                  San Joaquin River @ Potato Slough Mid-channel
## 18    D28A                         Old River @ Rancho Del Rio Mid-channel
## 19      D4            Sacramento River above Point Sacramento Mid-channel
## 20    D41                     San Pablo Bay near Pinole Point Mid-channel
## 21    D41A         San Pablo Bay near Mouth of Petaluma River Mid-channel
## 22     D42                     San Pablo Bay near Mare Island Mid-channel
## 23      D6               Suisun Bay @ Bulls Head nr. Martinez Mid-channel
## 24      D7            Grizzly Bay @ Dolphin nr. Suisun Slough Mid-channel
## 25      D8            Suisun Bay off Middle Point nr. Nichols Mid-channel
## 26    MD10                 Disappointment Slough @ Bishop Cut Shore-based
## 27   MD10A                  Disapointment Slough @ Bishop Cut Mid-channel
## 28     MD6                         Sycamore Slough near Mouth Mid-channel
## 29     MD7         South Fork Mokelumne below Sycamore Slough Mid-channel
## 30    MD7A                   Little Potato Slough @ Terminous Mid-channel
## 31   NZ002             Carquinez Straite near Glencove Harbor Mid-channel
## 32   NZ004                   Carquinez Straite near Ozol Pier Mid-channel
## 33   NZ032              Montezuma Slough, 2nd bend from mouth Mid-channel
## 34   NZ068            Sacramento River below Rio Vista Bridge Mid-channel
## 35   NZ325          San Pablo Bay near Rock Wall and Light 15 Mid-channel
## 36   NZS42                     Suisun Slough @ Volanti Slough Mid-channel
## 37     P10                      Middle River @ Victoria Canal Shore-based
## 38    P10A                         Middle River @ Union Point Shore-based
## 39     P12                      Old River @ Tracy Road Bridge Shore-based
## 40    P12A                             Old River @ Oak Island Shore-based
## 41      P2             Mokelumne River @ Franklin Road Bridge Shore-based
## 42      P8                   San Joaquin River @ Buckley Cove Mid-channel
## 43     S42         Suisun Slough 300' south of Volanti Slough Shore-based
##    Latitude Longitude   Status AverageDepth
## 1  37.67575 -121.2650 Inactive            3
## 2  37.67934 -121.2647   Active            4
## 3  38.34575 -121.5461 Inactive            2
## 4  38.36771 -121.5205   Active            1
## 5  37.78607 -121.3077 Inactive            1
## 6  37.83095 -121.5540   Active           45
## 7  38.04631 -121.9183   Active           23
## 8  38.04229 -121.7995 Inactive           12
## 9  38.02161 -121.8063   Active            6
## 10 38.01776 -121.7114 Inactive           23
## 11 38.05217 -121.6896 Inactive            4
## 12 38.09690 -121.6691   Active            1
## 13 38.04376 -121.6148   Active           35
## 14 38.06544 -122.0545 Inactive            6
## 15 38.08453 -121.7391   Active            4
## 16 38.15778 -121.6847 Inactive            2
## 17 38.07664 -121.5669   Active            1
## 18 37.97048 -121.5730   Active           34
## 19 38.06248 -121.8205   Active            5
## 20 38.03022 -122.3729   Active           74
## 21 38.08472 -122.3907   Active           23
## 22 38.05872 -122.2847 Inactive           43
## 23 38.04436 -122.1177   Active            4
## 24 38.11714 -122.0397   Active            5
## 25 38.05992 -121.9900   Active           34
## 26 38.04381 -121.4188 Inactive            3
## 27 38.04226 -121.4199   Active            4
## 28 38.14150 -121.4687 Inactive            2
## 29 38.12513 -121.4970 Inactive            1
## 30 38.11382 -121.4980 Inactive            1
## 31 38.06529 -122.2152   Active           45
## 32 38.03576 -122.1616   Active           23
## 33 38.16991 -122.0211   Active           12
## 34 38.14272 -121.6895   Active            6
## 35 38.05798 -122.2919   Active           23
## 36 38.18045 -122.0476   Active            4
## 37 37.89120 -121.4894 Inactive            1
## 38 37.89126 -121.4883 Inactive           35
## 39 37.80472 -121.4500 Inactive            6
## 40 37.80284 -121.4569 Inactive            4
## 41 38.25542 -121.4403 Inactive            2
## 42 37.97817 -121.3823   Active            1
## 43 38.18047 -122.0469 Inactive           34

You will notice that we have one line per station with various attributes, such as Station Type, Location, Latitude, Longitude, and depth.

(I actually just made up the depths for demonstration purposes, don’t use them for anything else)

The columns “latitude” and “longitude” tell you were the stations are, but R won’t recognize this as spatial data in this format. We need to identify which columns have the spatial component and what the coordinate reference system is. If you are not familiar with coordinate reference systems, check out this cheat sheet

maps with different reference systems

Components of a CRS

The coordinate reference system is made up of several key components:

  • Coordinate system:The X, Y grid upon which your data is overlayed and how you define where a point is located in space.
  • Horizontal and vertical units: The units used to define the grid along the x, y (and z) axis.
  • Datum:A modeled version of the shape of the Earth which defines the origin used to place the coordinate system in space.
  • Projection Information: The mathematical equation used to flatten objects that are on a round surface (e.g. the Earth) so you can view them on a flat surface (e.g. your computer screens or a paper map).

Geographic versus projected CRS

A geographic coordinate system locates latitude and longitude location using angles. Thus the spacing of each line of latitude moving north and south is not uniform. (example, WGS84)

Geogrphic CRS

A projected coordinate system locates latitude and longitude using static units. The line spacing is uniform so it is easier to transform and do analyses. However, they are usually only good for a small part of the earth (Example UTM)

Projected CRS

We can specify the CRS via a list of the components, or by it’s numeric code .

EMP uses WGS84, which is number 4326.

library(sf)
EMPsf = st_as_sf(EMP, coords = c("Longitude", "Latitude"),  crs = 4326)

EMPsf
## Simple feature collection with 43 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -122.3907 ymin: 37.67575 xmax: -121.2647 ymax: 38.36771
## geographic CRS: WGS 84
## First 10 features:
##    Station                                           Location StationType
## 1      C10                  San Joaquin River near Vernalis   Shore-based
## 2     C10A         San Joaquin River near Vernalis @ SJR Club Shore-based
## 3       C3                 Sacramento River @ Green's Landing Shore-based
## 4      C3A                            Sacramento River @ Hood Shore-based
## 5       C7 San Joaquin River from Boat Dock @ Mossdale Bridge Shore-based
## 6       C9                  West Canal @ Clifton Court Intake Shore-based
## 7      D10                   Sacramento River @ Chipps Island Mid-channel
## 8      D11                          Sherman Lake near Antioch Mid-channel
## 9      D12           San Joaquin River @ Antioch Ship Channel Mid-channel
## 10    D14A                              Big Break near Oakley Mid-channel
##      Status AverageDepth                   geometry
## 1  Inactive            3  POINT (-121.265 37.67575)
## 2    Active            4 POINT (-121.2647 37.67934)
## 3  Inactive            2 POINT (-121.5461 38.34575)
## 4    Active            1 POINT (-121.5205 38.36771)
## 5  Inactive            1 POINT (-121.3077 37.78607)
## 6    Active           45  POINT (-121.554 37.83095)
## 7    Active           23 POINT (-121.9183 38.04631)
## 8  Inactive           12 POINT (-121.7995 38.04229)
## 9    Active            6 POINT (-121.8063 38.02161)
## 10 Inactive           23 POINT (-121.7114 38.01776)

You will notice that instead of a “latitude” and “logitude” column, we now have a “geometry” column that contains both latitude and longitude.

Now let’s plot these points.

library(ggmap)

ggplot() + geom_sf(data = EMPsf)

OK, so that’s pretty easy! But it would be nice if we had an outline of the waterways so the points aren’t floating out in space. I’ve got this shapefile of the waterways of teh Delta, and it’s pretty easy to import and plot it in sf.

waterways = read_sf("hydro_delta_marsh.shp")
waterways
## Simple feature collection with 282 features and 9 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -122.6408 ymin: 37.41522 xmax: -120.9357 ymax: 38.67781
## geographic CRS: NAD83
## # A tibble: 282 x 10
##      AREA PERIMETER HYDRO_POLY HYDRO_PO_1 HYDRO_24K_ TYPE  HNAME Shape_Leng
##     <dbl>     <dbl>      <int>      <int>      <int> <chr> <chr>      <dbl>
##  1 7.35e7  1033340         791        797        798 MR    SACR~    2.45   
##  2 8.76e4     3319.       1965       1963       1964 S     W        0.0357 
##  3 7.92e6    87428.       1967       1965       1966 C     SACT~    0.829  
##  4 1.04e5     2719.       1970       1969       1970 L     GREE~    0.0264 
##  5 1.06e5     2798.       1977       1974       1975 L     LAKE~    0.0283 
##  6 1.59e5     3392.       1982       1978       1979 S     W        0.0314 
##  7 4.26e4     1003.       1992       1989       1990 S     W        0.00952
##  8 5.65e3      498.       2001       2008       2009 MR    SOUT~    0.00548
##  9 4.14e3      502.       2006       2012       2013 MR    SOUT~    0.00536
## 10 9.78e4     6630.       2008       2011       2012 MR    SOUT~    0.0746 
## # ... with 272 more rows, and 2 more variables: Shape_Area <dbl>,
## #   geometry <POLYGON [°]>

Notice that the ‘geometery’ column says ‘polygon’ instead of ‘point’ and it is actually a list with a bunch of points outlining the feature.

ggplot() + 
  geom_sf(data = waterways)+ 
  geom_sf(data = EMPsf) 

Tweaking the output

Excellent! Now let’s change some things. Fortunately, the ‘ggmap’ package can accept any of the ggplot themes, and all of the ggplot calls you might already know. You can also vary the shape, size, and color of the elements.

ggplot() + 
  geom_sf(data = waterways)+ 
  geom_sf(data = EMPsf, aes(color = StationType, size = AverageDepth)) +
  theme_bw()

We usually want a scale bar and north arrow on our maps. There are nice little built-in fucntions in the package “ggsn” that will do that for you.

library(ggsn)


ggplot() +
  geom_sf(data = waterways)+ 
  geom_sf(data = EMPsf, aes(color = StationType, size = AverageDepth)) +
  theme_bw()+
  
  #You can adjust the size, units, etc of your scale bar.
 scalebar(data = EMPsf, dist = 20, dist_unit = "km",
           transform = TRUE, st.dist = .1) +
  
  #there are a number of different optinos for north arrow symbols. ?north
  north(data = waterways, symbol = 2) 

Basemaps

We’d also like a basemap so we can see the rest of the world.

This is also a good opportunity to review rasters versus vectors. Spatial data can either be represented as a grid of equal-sized boxes, with data associated with each box (raster), or a group of polygons, with data associated with each point/line/polygon. The delta waterways shapefile is a vector - each river is made up of a list of points and has lines connecting the points.

Diagram of rasters versus vectors

Basemaps are usually rasters, rather than vectors, and have very little information associated with them. Each pixel just has a color associated with it, so it’s just a picture of a map.

There are a number of free basemap sources online, including Google Maps, but Stamen maps are one of the easiest to work with.

#download your map from online
basemap = get_stamenmap(bbox = c(left = -122.6, 
                                 right = -121.0, bottom = 37.5, top = 38.6),
                        maptype = "terrain")

#use "ggmap" rather than "ggplot" to work with the basemap

ggmap(basemap)+ #plot the basemap
  
  geom_sf(data = waterways, inherit.aes = F)+ #add teh waterways. Be sure to use "inherit.aes=F"
  
  geom_sf(data = EMPsf, aes(color = StationType, size = AverageDepth),inherit.aes = FALSE) + #add EMP

  #add scalebar and northa rrow
 scalebar(data = EMPsf, dist = 20, dist_unit = "km",
           transform = TRUE, st.dist = .1) +
  scale_color_discrete(guide = FALSE) +
  north(data = waterways, symbol = 2) 

## Labels

We can label our points with geom_sf_text and add annotations with “annotate”

ggmap(basemap, darken = c(.5, "white"))+ #plot the basemap and lighten it a little
  
  geom_sf(data = waterways, inherit.aes = F, fill = "lightblue")+ #add the waterways
  
  geom_sf(data = EMPsf, aes(color = StationType, size = AverageDepth),inherit.aes = FALSE) + #add EMP points
  
 scalebar(data = EMPsf, dist = 20, dist_unit = "km",
           transform = TRUE, st.dist = .1) + #add a scale bar
  
  north(data = EMPsf, symbol = 3) + #Add north arrow
  
geom_sf_text(data = EMPsf, aes(label = Station), #label the points
             nudge_x = 0.07, size = 3, #adjust size and position relative to points
             inherit.aes = FALSE,
             check_overlap = T) + #stop labels from being drawn over each other
annotate("text", label = "Yay Maps!", x = -122.2, y = 38.4, size = 8)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
## give correct results for longitude/latitude data

And there you have it. This map obviously needs a fair amount of work to be beautiful, but you can get it going relatively easily. If you need a copy of the Delta shapefile, check out the “deltamapr” package: https://github.com/InteragencyEcologicalProgram/deltamapr