Converting/Adding Spatial Coordinates

It’s been awhile since I posted. Turns out grad school and two small children require some attention. However, I’ve been just awful about posting things, and it really is a pretty painless and quick process, so I’m going to try and post something more regularly even if it’s a small tidbit of code, or a cool paper, or whatever. I think it will help me get into the habit of making this a bit more of a blog (who knows, maybe even a useful blog!?), and less of a static reminder of all the things I haven’t done.

This post is going to be about converting/adding/extracting X/Y points from a dataframe. I’ve found the new-ish sf package has been an amazing upgrade in R if you work with spatial data. I’ve found it typically is much faster, and requires less code than previous methods I’ve used (though that could just be due to my coding abilities!). I recently had to convert some UTM point data into lat-long, and wanted to update my old code (using rgdal, maptools & sp) to take advantage of the sf functions.

Spatial Data and CRS

Coordinate reference systems (CRS) (or the lack thereof) are often the bane of anyone who has worked with spatial data. I can’t describe the frustration, anxiety, and misery that can come with getting data with missing projection data, or confusion when things plot askew because the data are in 2 different CRS, or when data don’t plot at all because they exist in another dimension and you have to learn how to warp time and space to retrieve your data.

trying to figure out CRS is like time travel

trying to figure out CRS is like time travel

The good thing is, I can show you some code that will help you convert from one to the other fairly painlessly. The bad thing is, I can’t help you find out what that missing CRS might be if you don’t already know it.

Some Data to Convert

I work with point data largely, and it often comes in spreadsheets or .csv’s with various projections/datums. It might be WGS84 Lat/Lon, and I need to convert it to UTM NAD83. Turns out this is pretty easy using the sf package (again, assuming you know the original CRS). You can look up the blinding array of spatial projections here.

First we’ll need to load a couple packages and then read in some example data. For this blog entry I’ll show a quick way to grab some table data from an html page and make a map of it.

suppressMessages({
  library(tidyverse)
  library(sf)
  library(htmltab)
})

Grab Some Hotsprings Data for CA

I’m going to use a table of hotsprings with LAT and LONG columns. Let’s say we want to convert these data into UTMs, or add a set of UTM columns so our colleague can use the data for a comparison. First let’s scrape the data (I’m sure there’s a prettier way but this just works):

url <- "http://www.hotspringsdirectory.com/usa/ca/gps-usa-ca.html"
df <- htmltab(url, which=1, header = 1,
              colNames = c("STATE","LAT","LONG","SpringName","Temp_F", "Temp_C", "AREA", "USGS_quad") ) # get the data

# convert columns to numeric:
sapply(df, class)
##       STATE         LAT        LONG  SpringName      Temp_F      Temp_C
## "character" "character" "character" "character" "character" "character"
##        AREA   USGS_quad
## "character" "character"
cols.num<- c("LAT","LONG", "Temp_F", "Temp_C")
df[cols.num]<-sapply(df[cols.num], as.numeric) # some NA's where there are no temperatures available
## Warning in lapply(X = X, FUN = FUN, ...): NAs introduced by coercion

## Warning in lapply(X = X, FUN = FUN, ...): NAs introduced by coercion
df$LONG<-df$LONG*(-1) # because the table doesn't include this important bit
head(df)
##   STATE    LAT     LONG                           SpringName Temp_F Temp_C
## 2    CA 38.767 -122.748                LITTLE        GEYSERS    210     99
## 3    CA 41.534 -120.078 HOT        SPRINGS (SURPRISE VALLEY)    208     98
## 4    CA 36.045 -117.769              COSO        HOT SPRINGS    207     97
## 5    CA 41.670 -120.206         LAKE        CITY HOT SPRINGS    207     97
## 6    CA 36.036 -117.802                DEVILS        KITCHEN    207     97
## 7    CA 40.421 -121.375               TERMINAL        GEYSER    205     96
##           AREA              USGS_quad
## 2   SANTA ROSA (WHISPERING PINES 7.5)
## 3      ALTURAS          CEDARVILLE 15
## 4 DEATH VALLEY    HAIWEE RESERVOIR 15
## 5      ALTURAS          CEDARVILLE 15
## 6 DEATH VALLEY    HAIWEE RESERVOIR 15
## 7   SUSANVILLE        MT. HARKNESS 15

The chunk above does the following:

  • Function from the htmltab package goes and grabs the first table on the page, parses it out into a dataframe, and adds custom column names. We should have 303 rows and 8 columns of data, including the temperature in °F and °C.
  • Then identify the column classes and select columns we want to convert to numeric.
  • Apply those changes to those columns using sapply
  • Make sure our longitude is negative, so that these plot in the right hemisphere. :)

Make Data Spatial

Ok, now we need to make the data spatial, by assigning it a projection or CRS. Let’s assume these data are WGS84, which is CRS=4326, and one of the most commonly used for mapping purposes. The sf package has a very easy way of doing this using the st_as_sf function. Almost all the sf functions begin with st_, which is nice and allows us to more easily search though the library.

Below we need to provide the CRS, which we looked up already, and the columns that have the spatial data. One tip, these should be listed in X then Y order…which happens to be LONG then LAT, or if you are using UTMs, Easting then Northing.

# make the UTM cols spatial (X/Easting/lon, Y/Northing/lat)
df.SP <- st_as_sf(df, coords = c("LONG", "LAT"), crs = 4326)

Notice the only change is the addition of a list-column to our dataframe, called geometry, which is a sfc_POINT class. This means our data is now fully “spatial” in the sense that it can easily be exported to another format, like a .shp file, or converted to a different CRS.

We can always view the full projection info for a CRS by using the st_crs function, or check our current data frame. Notice if you run st_crs on the original df, the definitions are NA or empty.

st_crs(32610) ## more detailed def (UTMs)
## $epsg
## [1] 32610
##
## $proj4string
## [1] "+proj=utm +zone=10 +datum=WGS84 +units=m +no_defs"
##
## attr(,"class")
## [1] "crs"
st_crs(df.SP) 
## $epsg
## [1] 4326
##
## $proj4string
## [1] "+proj=longlat +datum=WGS84 +no_defs"
##
## attr(,"class")
## [1] "crs"

Transform or Re-Project

Ok! The final few steps and we’re there. Now that our data is in a “sf” class, it’s fairly straightforward to switch our projection, grab the new X/Y columns, and append them to our dataframe for future use. Here we convert to UTMs from lat-long, add those columns, and then convert back to lat-long and add those columns, just because repetition is fun practice.

# transform to UTM
df.SP<-st_transform(x = df.SP, crs = 32610)

# get coordinates and add back to dataframe
df.SP$utm_E<-st_coordinates(df.SP)[,1] # get coordinates
df.SP$utm_N<-st_coordinates(df.SP)[,2] # get coordinates

# now switch back to lat-long
df.SP<-st_transform(x = df.SP, crs = 4326)

# add coordinates to dataframe
df.SP$lon<-st_coordinates(df.SP)[,1] # get coordinates
df.SP$lat<-st_coordinates(df.SP)[,2] # get coordinates

At this point we now have a nice dataframe with two different sets of coordinates for any/all to use. We could write this data to a csv, or to a shapefile, or to whatever format we’d like really. If we wanted to remove the geometry column and make this data non-spatial again, we can use the st_set_geometry() function.

# coerce back to data.frame:
df.SP<-st_set_geometry(df.SP, NULL)

Map It

Let’s make a quick leaflet map and show these data off, just because it seems grim to have a spatial blog without a map.

library(leaflet)

leaflet() %>%
  addTiles() %>%
  addProviderTiles("Esri.WorldTopoMap", group = "Topo") %>%
  addProviderTiles("Esri.WorldImagery", group = "ESRI Aerial") %>%
  addCircleMarkers(data=df.SP, group="Hot Springs", radius = 4, opacity=1, fill = "darkblue",stroke=TRUE,
                   fillOpacity = 0.75, weight=2, fillColor = "yellow",
                   popup = paste0("Spring Name: ", df.SP$SpringName,
                                  "<br> Temp_F: ", df.SP$Temp_F,
                                  "<br> Area: ", df.SP$AREA)) %>%
  addLayersControl(
    baseGroups = c("Topo","ESRI Aerial"),
    overlayGroups = c("Hot SPrings"),
    options = layersControlOptions(collapsed = T))

Pretty cool!