-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathgetdata.qmd
More file actions
204 lines (144 loc) · 9.81 KB
/
getdata.qmd
File metadata and controls
204 lines (144 loc) · 9.81 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
---
title: "Pulling Data"
format:
html:
eval: false
---
Below is the code used to import all data sets used for this workshop. There are tons of R packages that allow you to connect to open-source spatial databases and pull spatial data right into your R session (without having to download and read in the data separately). Some packages require an API key before importing the data, which are free to sign up for.
## Load in Libraries
This function checks if all packages are installed in your local system, if not it will install them, and then load all of them into your R session. If you don't have any of these packages installed this step may take a little while.
```{r}
#| warning: false
#| message: false
#|
packageLoad <-
function(x) {
for (i in 1:length(x)) {
if (!x[i] %in% installed.packages()) {
install.packages(x[i])
}
library(x[i], character.only = TRUE)
}
}
packageLoad(c("elevatr", "rgbif", "tidycensus", "tigris", "sf", "terra", "dplyr"))
```
For this workshop we are focusing on the state of Colorado, so we first download some political boundaries to use to filter the raster layers we want returned. The `tigris` package allows you to directly download TIGER/Line shapefiles from the US Census Bureau.
```{r}
#| message: false
#| warning: false
#| results: hide
# download county shapefile for the state of Colorado
counties <- tigris::counties(state = "CO")
```
## Raster Data
Get elevation data using the [`elevatr`](https://github.com/jhollist/elevatr) package. The function `get_elev_raster()` returns a raster digital elevation model (DEM) from the AWS Open Data Terrain Tiles. For this function you must supply a spatial object specifying the extent of the returned elevation raster and the resolution (specified by the zoom level `z`). We are importing elevation at \~ 1km resolution (\~ 900 m).
```{r}
#| eval: false
# get elevation at ~1km (864m)
elevation <- get_elev_raster(counties, z = 7)
# save elevation to file as a GeoTIFF
terra::writeRaster(elevation, "data/elevation_1km.tif")
```
For this workshop we are also going to work with a land cover raster data set from the National Land Cover Database (NLCD). You can download NLCD data directly from R using the [`FedData`](https://docs.ropensci.org/FedData/) package, however the most updated land cover data available is from 2011. For this course, NLCD 2019 CONUS data was downloaded to my local system from the [MRLC website](https://www.mrlc.gov/data/nlcd-2019-land-cover-conus). The following code is how I read in and cleaned the land cover data for use in this workshop. Processing includes cropping the CONUS data set to the state of Colorado, and then aggregating (reducing the resolution) of the raster from 30m to \~1km (990m) to make processing and analysis quicker for the workshop.
```{r}
#| eval: false
# Read in the raster (image) file
land <- terra::rast('L:/Projects_active/EnviroScreen/data/NLCD/Land Cover/nlcd_2019_land_cover_l48_20210604.img')
#transform the counties spatial object to match landcover so we can perform crop and mask operations
counties_aea <- st_transform(counties, crs(land))
# crop the landcover to Colorado
land_co <- land %>%
terra::crop(vect(counties_aea)) %>%
terra::mask(vect(counties_aea))
#aggregate to ~1km for ease of processing/analysis in course
land_co1km <- terra::aggregate(land_co, fact = 33, fun = "modal")
# save processed raster file
terra::writeRaster(land_co1km, filename = "data/NLCD_CO.tif", overwrite = TRUE)
```
## Point Data
We will be working with some species occurrences in the form of point data (latitude/longitude). `rgbif` is a package that allows you to download species occurrences from the [Global Biodiversity Information Facility (GBIF)](https://www.gbif.org/), a database of global species occurrences with over 2.2 billion records.
| | | |
|:-------------------------:|:----------------------------:|:--------------------------------:|
| {width="182"} | {width="171"} | {width="208"} |
| Elk | Yellow-Bellied Marmot | Western Tiger Salamander |
```{r}
#make a string of species names to use in the 'occ_data' function
species <- c("Cervus canadensis", "Marmota flaviventris", "Ambystoma mavortium")
#also make a string of common names to use for plotting later
common_name <- c("Elk", "Yellow-bellied Marmot", "Western Tiger Salamander")
```
```{r}
# write a for loop to extract occurrence data for each species
# create an empty vector to store each species' downloaded occurrence data
occ <- vector("list", length = length(species))
for(i in 1:length(occ)){
occ[[i]] <-
occ_data(
scientificName = species[i],
hasCoordinate = TRUE,
geometry = st_bbox(counties),
limit = 2000
) %>%
.$data #return just the data frame
# add species name column as ID to use later
occ[[i]]$ID <- common_name[i]
#clean by removing duplicate occurrences
occ[[i]] <-
occ[[i]] %>% distinct(decimalLatitude, decimalLongitude, .keep_all = TRUE) %>%
dplyr::select(Species = ID,
decimalLatitude,
decimalLongitude,
year,
month,
basisOfRecord) #only keep relevant variables
print(i) # this prints each element once its finished so you can see the progress
}
# Bind all data frames together
occ <- bind_rows(occ)
```
```{r}
write_csv(occ, "data/species_occ.csv")
```
## Line Data
Use the `tigris` package to download linear water features (streams/rivers, braided streams, canals, ditches, artificial paths, and aqueducts) and roads for Larimer County.
```{r}
rivers <- linear_water(state = "CO", county = "Larimer")
roads <- roads(state = "CO", county = "Larimer")
# save the files
st_write(rivers, "data/rivers.shp")
st_write(roads, "data/roads.shp")
```
## Polygon Data
We already imported a spatial polygon data set, `counties`. We are going to work with one more polygon data set from the `tigris` package which includes the boundaries of individual urban areas and clusters across Colorado. We have to do a little data cleaning to filter just urban areas within the state of Colorado.
```{r}
urban <- urban_areas() %>%
tidyr::separate(col = NAME10, sep = ", ", into = c("city", "state")) %>%
dplyr::filter(state == "CO")
#save the file
st_write(urban, "data/urban_areas.shp")
```
Finally, we are going to work with some census variables, namely total population and median household income. You can import census data with the `tidycensus` package, which does require an API key. You can obtain a key for free [here](http://api.census.gov/data/key_signup.html). We import these variables using the `get_acs()` function ("acs" stands for American Community Survey), specifying the state of Colorado and to return data at the county level. We import just the county level data here as we will tie these attributes to our spatial data later, but you could also import these results as spatial objects by setting `geometry = TRUE`.
```{r}
# supply your unique API key
census_api_key("PASTE YOUR API KEY HERE")
#import total population and median household income
census <- get_acs(geography = "county", state = "CO", year = 2019,
variables = c("B01003_001", "B19013_001"), output = "wide")
#clean the data
census <- census %>%
dplyr::select(contains("E")) %>%
rename(total_pop = B01003_001E, med_income = B19013_001E)
#save it
write_csv(census, "data/census_data.csv")
```
## Other Data Libraries
R's collection of data retrieval libraries is extensive. We only use a few of them in this workshop, but I wanted to mention a few other packages that may be of interest:
| | |
|-------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| [`rnaturalearth`](https://docs.ropensci.org/rnaturalearth/articles/rnaturalearth.html) | Natural Earth spatial data |
| [`rnoaa`](https://docs.ropensci.org/rnoaa/articles/rnoaa.html) | NOAA weather data |
| [`dataRetrieval`](https://cran.r-project.org/web/packages/dataRetrieval/vignettes/dataRetrieval.html) | USGS water data |
| [`wdpar`](https://cran.r-project.org/web/packages/wdpar/vignettes/wdpar.html) | World Database on Protected Areas |
| [`rgee`](https://github.com/r-spatial/rgee) | use Google Earth Engine (and connect to the entire data collection) in R |
| [`nhdplusTools`](https://usgs-r.github.io/nhdplusTools/) | Hydrographic data |
| [`FedData`](https://github.com/ropensci/FedData) | Spatial data from several U.S. federal data sources, such as elevation, hydrography, soil, land cover, cropland, and climate data sets. |