-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
200 lines (144 loc) · 19.2 KB
/
README.Rmd
File metadata and controls
200 lines (144 loc) · 19.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
---
output: github_document
---
# PHAB
#### *Marcus W. Beck, marcusb@sccwrp.org, Raphael D. Mazor, raphaelm@sccwrp.org, Andrew C. Rehn, andy.rehn@wildlife.ca.gov*
[](https://ci.appveyor.com/project/SCCWRP/PHAB)
[](https://travis-ci.org/SCCWRP/PHAB)
[](https://zenodo.org/badge/latestdoi/108920024)
The PHAB package contains materials to calculate an index of physical integrity (IPI) for California streams. This index is estimated using site-specific data and available metrics of physical habitat to describe overall integrity.
This tutorial assumes that the user is familiar with basic operations in the R programming language, such as data import, export, and manipulation. Although not required, we recommend using an integrated development environment for R, such as R-studio, which can be downloaded at [http://www.rstudio.com](http://www.rstudio.com). New users are encouraged to pursue training opportunities, such as those hosted by local R user groups. A list of such groups may be found here: http://blog.revolutionanalytics.com/local-r-groups.html. R training material developed by SCCWRP can also be found online: [https://sccwrp.github.io/SCCWRP_R_training/](https://sccwrp.github.io/SCCWRP_R_training/)
# Installation
Install the package as follows:
```{r, eval = FALSE}
install.packages('devtools')
library(devtools)
install_github('SCCWRP/PHAB')
library(PHAB)
```
# Basic usage
The core function is `IPI` that requires station and phab data as input.
```{r, echo = F, message = F, warning = F}
devtools::load_all()
```
```{r}
IPI(stations, phab)
```
# Detailed usage
The PHAB package can be installed from the R console with just a few lines of code. The current version of the package can be found on SCCWRP's GitHub page [here](https://github.com/SCCWRP/PHAB) and can be installed using the devtools package. The devtools package must be installed first before the PHAB package can be installed. Start by installing and loading devtools:
```{r eval = F}
install.packages('devtools')
library(devtools)
```
Now the `install_github()` function from devtools can be used to install PHAB from GitHub. The package can be loaded after installation.
```{r eval = F}
install_github('SCCWRP/PHAB')
library(PHAB)
```
The installation process may take a few seconds. Both the devtools and PHAB packages depend on other R packages, all of which are installed together. If an error is encountered during installation, an informative message is usually printed in the R console. This information can help troubleshoot the problem, such as identifying which dependent packages may need to be installed separately. Please contact SCCWRP staff if additional errors are encountered.
After the package is successfully installed, you will be able to view the help file for the PHAB core function:
```{r, eval = F}
?IPI
```
## Preparing input data
The IPI is estimated using station and PHAB metric data as input. Station GIS data can be obtained using the GIS insructions that accompany this document and PHAB metric data can be obtained from the state of California SWAMP reporting module. Sample data are provided with the PHAB package to demonstrate the required format and are loaded automatically once the package is installed and loaded (i.e., just type the name of the sample data in the R console to view). You can view the `stations` and `phab` example data from the R console:
```{r}
head(stations)
head(phab)
```
The `stations` data include site-specific information derived from geospatial analysis. These data are in wide format where one row corresponds to data for one site. The following fields are required:
* `StationCode` unique station identifier
* `MAX_ELEV` maximum elevation in the watershed in meters
* `AREA_SQKM` watershed area in square kilometers
* `ELEV_RANGE` elevation range of the watershed in meters
* `MEANP_WS` mean phosphorus geology from the watershed
* `New_Long` site longitude, decimal degrees
* `SITE_ELEV` site elevation
* `KFCT_AVE` average soil erodibility factor
* `New_Lat` site latitude, decimal degrees
* `MINP_WS` minimum phosphorus geology from the watershed
* `PPT_00_09` average precipitation (2000 to 2009) at the sample point, in hundredths of millimeters
The `phab` data include calculated physical habitat metrics that are compiled along with the `stations` data to get the IPI score. These data are in long format where multiple rows correspond to physical habitat metric values for a single site. The following fields are required:
* `StationCode` unique station identifier
* `SampleDate` date of the sample
* `SampleAgencyCode` the sample agency code that collected the data
* `Variable` name of the PHAB metric
* `Result` resulting metric value
* `Count_Calc` number of unique observations that were used to estimate the value in `Result`
Values in the `Variable` column of the `phab` data indicate which PHAB metric was measured that corresponds to values in the `Result` column. The required PHAB metrics that should be provided for every unique sampling event specified by `StationCode` and `SampleDate` are as follows:
* `XSLOPE` mean slope of reach
* `XBKF_W` mean bankfull width
* `H_AqHab` Shannon diversity of aquatic habitat types
* `PCT_SAFN` percent sand and fine (<2 mm) substrate particles
* `XCMG` riparian cover sum of three layers
* `Ev_FlowHab` evenness of flow habitat types
* `H_SubNat` Shannon diversity of natural substrate types
* `XC` mean upper canopy trees and saplings
* `PCT_POOL` percent pools in reach
* `XFC_ALG` mean filamentous algae cover
* `PCT_RC` percent concrete/asphalt
Each PHAB metric serves a specific purpose in the calculation and reporting of the IPI. Some metrics (i.e., `H_AqHab`, `PCT_SAFN`, `XCMG`, `Ev_FlowHab`, and `H_SubNat`) are used to assess habitat condition. Other metrics (i.e., `XSLOPE`, `XBKF_W`, and `PCT_RC`) are used as predictors or score modifiers for components of the IPI. Finally, some metrics (i.e., the `XC`, `PCT_POOL`, and `XFC_ALG`) are required because they are used for quality assurance checks.
All required fields for the `stations` and `phab` data are case-sensitive and must be spelled correctly. The order of the fields does not matter. All `StationCode` values must be shared between the datasets. As described below, the `IPI()` function automatically checks the format of the input data prior to estimating scores.
## Detailed metric descriptions
Five of the required PHAB metrics in the input data are used directly for scoring the IPI, whereas the remainder serve a supporting role as predictors or modifiers for different parts of the complete index. Understanding what each of five core metrics describe about stream condition and how they vary with disturbance is critical for interpreting the index. Below is a detailed description of each metric (excerpted from the [tech memo](https://www.waterboards.ca.gov/water_issues/programs/swamp/bioassessment/docs/physical_habitat_index_technical_memo.pdf), click for more detail).
*Shannon diversity of natural instream cover*. `H_AqHab` measures the relative quantity and variety of natural structures in the stream, such as cobble, large and small boulders, fallen trees, logs and branches, and undercut banks available as refugia, or as sites for feeding or spawning and nursery functions of aquatic macrofauna. A wide variety and/or abundance of submerged structures in the stream provides macroinvertebrates and fish with a large number of niches, thus increasing habitat diversity. When variety and abundance of cover decreases (e.g., due to hydromodification, increased sedimentation, or active stream clearing), habitat structure becomes monotonous, diversity decreases, and the potential for recovery following disturbance decreases. Snags and submerged logs—especially old logs that have remained in-place for several years--are among the most productive habitat structure for macroinvertebrate colonization and fish refugia in low-gradient streams.
*Percent sand and fine substrate*. `PCT_SAFN` measures the amount of small-grained sediment particles (i.e., <2 mm) that have accumulated in the stream bottom as a result of deposition. Deposition may result from soil disturbance in the catchment, landslides, and bank erosion. Sediment deposition may cause the formation of islands or point bars, filling of runs and pools, and embeddedness of gravel, cobble, and boulders and snags, with larger substrate particles covered or sunken into the silt, sand, or mud of the stream bottom. As habitat provided by cobbles or woody debris becomes embedded, and as interstitial spaces become inundated by sand or silt, the surface area available to macroinvertebrates and fish is decreased. High levels of sediment deposition are symptoms of an unstable and continually changing environment that becomes unsuitable for many organisms. Although human activity may deplete sands and fines (e.g., by upstream dam operations), and this depletion may harm aquatic life, the IPI treats only increases in this metric as a negative impact on habitat quality, although a *post-hoc* correction was made whereby the metric percent concrete (`PCT_RC`) is added to `PCT_SAFN` before scoring.
*Shannon diversity of natural substrate types*. `H_SubNat` measures the diversity of natural substrate types, assessing how well multiple size classes (e.g., gravel, cobble and boulder particles) are represented. In a stream with high habitat quality for benthic macroinvertebrates, layers of cobble and gravel provide diversity of niche space. Occasional patches of fine sediment, root mats and bedrock also provide important habitat for burrowers or clingers, but do not dominate the streambed. Lack of substrate diversity, e.g., where >75% of the channel bottom is dominated by one particle size or hard-pan, or with highly compacted particles with no interstitial space, represents poor physical conditions. Riffles and runs with a diversity of particle sizes often provide the most stable habitat in many small, high-gradient streams.
*Evenness of flow habitat types*. `Ev_FlowHab` measures the evenness of riffles, pools, and other flow microhabitat types. Optimal physical conditions include a relatively even mix of velocity/depth regimes, with regular alternation between riffles (fast-shallow), runs (fast-deep), glides (slow-shallow) and pools (slow-deep). Poor conditions occur when a single microhabitat dominates (usually glides, with pools and riffles absent). A stream that has a uniform flow regime will typically support far fewer types of organisms than a stream that has a variety of alternating flow regimes. Riffles in particular are a source of high-quality habitat and diverse fauna, and their regular occurrence along the length of a stream greatly enhances the diversity of the stream community. Pools are essential for many fish and amphibians.
*Riparian vegetation cover, sum of three layers*. `XCMG` measures the amount of vegetative protection afforded to the stream bank and the near-stream portion of the riparian zone. The root systems of plants growing on stream banks help hold soil in place, thereby reducing the amount of erosion likely to occur. The vegetative zone also serves as a buffer to pollutants entering a stream from runoff and provides shading and habitat and nutrient input into the stream. Banks that have full, multi-layered, natural plant growth are better for fish and macroinvertebrates than are banks without vegetative protection or those shored up with concrete or riprap. Vegetative removal and reduced riparian zones occur when roads, parking lots, fields, lawns, bare soil, riprap, or buildings are near the stream bank. Residential developments, urban centers, golf courses, and high grazing pressure from livestock are the common causes of anthropogenic degradation of the riparian zone. Even in undeveloped areas, upstream hydromodification and invasion by non-native species can reduce the cover and quality of riparian zone vegetation.
## Using the `IPI()` function
The IPI score for a site is estimated from the station and PHAB data. The score is estimated automatically by the `IPI()` function in the package following several steps. First, reference expectations for a site are estimated for predictive metrics using the station data. Then, observed data values are compared to reference expectations for predictive metrics and the differences between observed and predicted (i.e., metric residuals) are used for scoring. For metrics that are not predicted, raw metric values are used for scoring. Metric scores are based on the upper and lower percentiles of either metric residuals or raw metric values observed at reference and high-activity sites. The metric scores are then summed and standardized (i.e., divided) by the mean sum of scores at reference sites to obtain the final IPI score.
The `IPI()` function can be used on station and PHAB data that are correctly formatted as shown above. The `stations` and `phab` example data are in the correct format and are loaded automatically with the PHAB package. These data are used here to demonstrate use of the `IPI()` function.
```{r}
IPI(stations, phab)
```
A data frame of IPI scores estimated at each site on each unique sample date is returned. The output data are in wide format with one row for each sample date at a site. Detailed information for each output column is as follows:
* `StationCode` unique station identifier
* `SampleDate` date of the site visit
* `PHAB_SampleID` unique identifier of the sampling event. Typically, the station code and sample data are sufficient to determine unique sampling events.
* `IPI` score for the index of physical integrity
* `IPI_percentile` the percentile of the IPI score relative to scores at reference sites
* `Ev_FlowHab` evenness of flow habitat types, from the raw PHAB metric
* `Ev_FlowHab_score` IPI score for evenness of flow habitat types
* `H_AqHab` Shannon diversity of natural instream cover types, from the raw PHAB metric
* `H_AqHab_pred` predicted Shannon diversity of natural instream cover types
* `H_AqHab_score` scored Shannon diversity of natural instream cover types
* `H_SubNat` Shannon Diversity of natural substrate types, from the raw PHAB metric
* `H_SubNat_score` scored Shannon diversity of natural substrate types
* `PCT_SAFN` percent sand and fine substrate, from the raw PHAB metric
* `PCT_RC` percent concrete/asphalt, from the raw PHAB metric and is combined with `PCT_SAFN` to provide an overall estimate of substrate with poor suitability for macrofauna
* `PCT_SAFN_pred` predicted percent sand and fine substrate
* `PCT_SAFN_score` scored percent sand and fine substrate, includes `PCT_RC`
* `XCMG` riparian cover as sum of three layers, from the raw PHAB metric
* `XCMG_pred` predicted riparian cover as sum of three layers
* `XCMG_score` scored riparian cover as sum of three layers
* `IPI_qa` quality assurance for the IPI score
* `Ev_FlowHab_qa` quality assurance for evenness of flow habitat types
* `H_AqHab_qa` quality assurance for Shannon diversity of aquatic habitat types
* `H_SubNat_qa` quality assurance for Shannon diversity of natural substrate types
* `PCT_SAFN_qa` quality assurance for percent sand and fine substrate
* `XCMG_qa` quality assurance for riparian cover as sum of three layers
Metrics are included in the output as observed PHAB metrics, predicted metrics (where applicable), and scored metrics. Observed PHAB metrics are returned as-is from the input data. Some PHAB metrics include a predicted column that shows the modelled metric value based on the environmental setting at a site. Scored PHAB metrics are obtained following the description above.
The last five columns include quality assurance information for the IPI score and select metrics. QA values less than one indicate less quality assurance, usually resulting from metric values being calculated from fewer measurements from a sample than specified by field protocols. `IPI_qa` (the overall QA value for the IPI) is based on the lowest score among all metrics. At this time, there is no criterion for flagging an IPI score based on QA measurements; analysts are advised to use their personal judgment when evaluating IPI scores with low QA measurements.
## Interpreting IPI scores
The IPI was calibrated during its development so that the mean score of reference sites is 1; IPI scores near 1 represent locations with conditions similar to reference sites. Scores that approach 0 indicate great departure from reference condition and degradation of physical condition. Scores > 1 can be interpreted to indicate greater physical complexity than predicted for a site given its natural environmental setting. All metric scores are weighted equally to determine the overall IPI score. For observed and scored PHAB metrics, all are expected to decrease under degraded physical conditions, except PCT_SAFN which is expected to increase in response to degradation.
## Calibration data
An additional data file is available within the PHAB package that shows calibration data for scoring the IPI metrics. This file is called `refcal` and includes observed and predicted scores at reference and high-activity (or "stressed") sites for the five PHAB metrics. Metrics are scored based on deviation from the 5th and 95th percentile of scores at reference or calibration sites. The `refcal` dataset includes observations at these sites that were used to identify percentile cutoffs for estimating metric scores (the dataset is loaded automatically with the PHAB package, just type the name in the console to view).
```{r}
head(refcal)
```
* `Variable` name of the PHAB metric
* `StationCode` unique station identifier
* `SampleID2` unique identifier of the sampling event
* `SiteSet` indicating if a site was reference or stressed
* `Result` resulting metric value
* `Predicted` predicted metric value
## Error checks for input data
The `IPI()` function will evaluate both the `stations` and `phab` input datasets for correct format before estimating IPI scores. IPI scores will not be calculated if any errors are encountered. The following checks are made:
* No duplicate station codes in `stations`. That is, input data have one row per station.
* All station codes in `stations` are in `phab`, and vice-versa
* All required fields are present in `stations` and `phab` (see above)
* All required PHAB metrics are present in the `variable` field of `phab` for each station and sample date (see above). An exception is made for `XC`, `PCT_POOL`, and `XFC_ALG`, which are not necessary for calculating the IPI but are used for optional quality assurance checks.
* No duplicate results for PHAB variables at each station and sample date. That is, one row per station, date, and phab metric.
* All input variables for `stations` and `phab` are non-negative, excluding elevation variables in `stations` which may be negative if below sea level (i.e., some locations in southeast California). Moreover, the variables `XBKF_W` and `Ev_FlowHab` in `phab` must also be greater than zero.
The `IPI()` function will print informative messages to the R console if any of these errors are encountered. It is the responsibility of the analyst to correct any errors in the raw data before proceeding.