Part II: Download NHD Flowline Data and Use sf Functions

So last post (Part I), I showed how to calculate distance matrices and find out the nearest points between two sets of point data. Now that we know how to identify the nearest neighbor for one set of points to another, let’s look at how we can grab some line data (in this blogpost, I’m using river networks) and crop it based on our point data. I’ll show how to go download a few different sets of river line data, how to buffer, and how to crop/intersect data based on other polygons or point data. The next post (Part III) will show how you can snap the spatial point data to the river network (line) data using the riverdist package, and calculate the river distance between sites (so along the river network).

Load the Packages

The main packages I’m going to use in this post:

# load libraries
suppressMessages({
  library(dplyr); # data munging and piping
  library(purrr); # for list functions
  library(ggplot2); # plotting
  library(ggrepel) # for labeling
  library(sf); # spatial simple features
  library(USAboundaries); # state/county data
  library(Imap); # nice mapping/color functions
  #library(geoknife); # USGS tool set (Next post)
  #library(dataRetrieval); # USGS tool set (next post)
  library(httr) # scraping webdata (to download flowlines)
})

Note, to plot sf objects with ggplot, you need to have the most recent version (>2.2.1). A quick and easy check you can add to your code is an if statement:

if (utils::packageVersion("ggplot2") > "2.2.1"))

Download River Line Data

I’m going to demo two different sets of watershed/river data which are openly accessible, and I think represent very good data for these types of analyses.

  • US: If you work in the US, the go-to data is typically the National Hydrography Dataset. This dataset has myriad of different components, and is linked to the USGS NWIS. These are really amazing data that are publically available, so I’ll walk through just a few of the things you can do.
  • Global: There’s a cool dataset with global coverage which includes both stream and lakes called HydroSHEDS. It’s based on high-resolution elevation data obtained during Space shuttle flight as part of NASA’s Shuttle Radar Topography Mission (SRTM). To access the data, you’ll need to create a free account (just an email req’d), but once that’s done you can download any piece of these data. I’ll show the same example with these data as well, and provide a snippet folks can download if need be.

Using NHD Data

There are many components of the NHD dataset, but thankfully folks over at Office of Water Information at USGS have built some great tools and code to work with NHD data. I’ll show how to download the NHD River data at different scales, and how to crop it to an area of interest using another polygon, or a set of points. If you want to see a few cool posts that utilize the USGS tools and data, I definitely recommend checking out some of these:

Load the NHDFlowline Function

First we are going to load a few libraries and functions which are required to download the NHD data. This function is from the blogs linked above, and requires specification of streamorder and mapRange. Note the default downloads stream layers with a stream order = 1, so the finest scale possible (headwater streams are 1, and larger rivers are usually > 4–5).

I’m not going to go into much detail on this function, but the original function comes from a gist by Laura DeCicco (see here), and I modified it slightly to work with the sf package.

library(httr)

get_flowlines <- function(streamorder, mapRange){
  postURL <- "https://cida.usgs.gov/nwc/geoserver/nhdplus/ows"
  
  filterXML <- paste0('<?xml version="1.0"?>',
                '<wfs:GetFeature xmlns:wfs="http://www.opengis.net/wfs" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gml="http://www.opengis.net/gml" service="WFS" version="1.1.0" outputFormat="shape-zip" xsi:schemaLocation="http://www.opengis.net/wfs http://schemas.opengis.net/wfs/1.1.0/wfs.xsd">',
                  '<wfs:Query xmlns:feature="https://gov.usgs.cida/nhdplus" typeName="feature:nhdflowline_network" srsName="EPSG:4326">',
                    '<ogc:Filter xmlns:ogc="http://www.opengis.net/ogc">',
                      '<ogc:And>',
                        '<ogc:PropertyIsGreaterThan>',
                          '<ogc:PropertyName>streamorde</ogc:PropertyName>',
                          '<ogc:Literal>',streamorder-1,'</ogc:Literal>',
                        '</ogc:PropertyIsGreaterThan>',
                        '<ogc:BBOX>',
                          '<ogc:PropertyName>the_geom</ogc:PropertyName>',
                          '<gml:Envelope>',
                            '<gml:lowerCorner>',mapRange[3]," ",mapRange[1],'</gml:lowerCorner>',
                            '<gml:upperCorner>',mapRange[4]," ",mapRange[2],'</gml:upperCorner>',
                          '</gml:Envelope>',
                        '</ogc:BBOX>',
                      '</ogc:And>',
                    '</ogc:Filter>',
                  '</wfs:Query>',
                '</wfs:GetFeature>')

  destination = file.path(tempdir(),"nhdflowline_network.zip")
  file <- POST(postURL, body = filterXML, write_disk(destination, overwrite=T))

  filePath <- tempdir()
  print("unzipping...")
  unzip(destination, exdir = filePath)
  
  flowLines <- st_read(filePath, layer = 'nhdflowline_network')

  return(flowLines)
}

Download Some Boundary Data: State/County/HUC

Now we have a function to download river network or flowline data. But it’s probably more useful to specify a boundary or location which we can use to subset our flowline data. For this example I’ll show how to get data by county/state with the USAboundaries package, or by using a watershed boundary.

State & County Data

library(USAboundaries) # STATES/counties data
library(ggrepel) # for labeling

# set the state and county names of interest
state_names <- c("california")
co_names <- c("Butte", "Placer", "El Dorado", "Nevada", "Yuba", "Sierra", "Plumas")

# get STATE data
CA<-us_states(resolution = "high", states = state_names) %>%
  st_transform(crs = 4326)

# get COUNTY data for a given state
counties_spec <- us_counties(resolution = "low", states=state_names) %>% # use list of state(s) here
  filter(name %in% co_names) %>% # filter to just the counties we want
  mutate(lon=map_dbl(geometry, ~st_centroid(.x)[[1]]), # add centroid values for labels
         lat=map_dbl(geometry, ~st_centroid(.x)[[2]])) # add centroid values for labels

# get range of lat/longs from counties for mapping and river function
mapRange1 <- c(range(st_coordinates(counties_spec)[,1]),range(st_coordinates(counties_spec)[,2]))

# Make a quick Map:

ggplot() + 
  geom_sf(data=CA, color = "gray30", lwd=2, fill=NA) +
  geom_sf(data=counties_spec, fill = NA, show.legend = F, color="gray50", lwd=0.4) +
  geom_label_repel(data=counties_spec, aes(x=lon, y=lat, label=name)) +
  coord_sf(xlim = mapRange1[c(1:2)], ylim = mapRange1[c(3:4)]) +
  theme_bw()

HUC Data

For watershed boundaries, I’m using hydrologic unit code (HUC) boundaries, specifically HUC8, because it’s a decent size watershed and makes a good example. If you want to download HUC data on your own (there are many other sets of data that you may download as well), you can visit the NRCS website:

  • Selecting your area of interest
  • Accept the area,
  • Then select the data you want to download for that area (e.g., Hydrologic Units: 8 digit Watershed Boundary Dataset).
  • Select your output (e.g. shapefile) and a projection (e.g., and WGS84
  • Enter an email, and you are good to go.

If you want to grab the data I’m using for this example, you can download it here on github.

I showed this in Part I, but I’m just unzipping the file, reading it, and then removing the unzipped files so that I have the sf object in my environment. I’m also filtering to a single HUC8 to make this example a bit simpler.

huc8 <- read_sf(unzip("data/h8_AMR_BEA_YUB.zip"), quiet = F) %>%
  st_transform(crs=4326) #%>% 
## Reading layer `h8_AMR_BEA_YUB' from data source `/Users/ryanpeek/Documents/github/ryanpeek.github.io/h8_AMR_BEA_YUB.dbf' using driver `ESRI Shapefile'
## Simple feature collection with 4 features and 5 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -121.5976 ymin: 38.61455 xmax: -119.9829 ymax: 39.77696
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
# then remove raw files since file is added in memory
file.remove(list.files(pattern = "h8_AMR_BEA_YUB*",recursive = F))
## [1] TRUE TRUE TRUE TRUE
h8 <- huc8 # rename for now

# get map range from this layer for flowlines call
mapRange <- c(range(st_coordinates(h8)[,1]),range(st_coordinates(h8)[,2]))

Great! Now we have some boundaries to work with. Let’s take a look at these pieces (the State/counties, and the HUC8) on a ggplot2 map. Notice there is a geom_sf call for sf data. For this example, I’m showing both a map where I don’t specify an extent (so it defaults to the first layer, which is CA), and a map with coord_sf and the mapRange which is the extent (or maximum/minimum bounding box) of the counties. Notice the difference.

# defaults to extent of first layer
ggplot() + 
  geom_sf(data=CA, color = "gray30", lwd=2, fill=NA) +
  geom_sf(data=counties_spec, fill = NA, show.legend = F, color="gray50", lwd=0.4) +
  geom_label_repel(data=counties_spec, aes(x=lon, y=lat, label=name)) +
  geom_sf(data=h8, aes(fill=HU_8_NAME), alpha=0.3, color="maroon")+
  labs(title="CA with selected counties & HUC8s", y="Latitude", x="Longitude") +
  theme_bw(base_family = "Roboto Condensed")