|
| 1 | +Pilot Walkthrough |
| 2 | +================ |
| 3 | +Kristina Riemer |
| 4 | + |
| 5 | +### Intro |
| 6 | + |
| 7 | +Using data from TERRA REF project. Get and plot trait data, and then |
| 8 | +same for weather data. |
| 9 | + |
| 10 | +Will be live coding, so if you want can follow along doing what I do on |
| 11 | +your own machine. |
| 12 | + |
| 13 | +Will be using R + RStudio, and following R packages: traits to get data, |
| 14 | +dplyr and lubridate for data cleaning, ggplot for plotting trait data. |
| 15 | + |
| 16 | +Full tutorials for these at: terraref.github.io/tutorials/. I can also |
| 17 | +send the code used here specifically if people want it. |
| 18 | + |
| 19 | +### Traits download |
| 20 | + |
| 21 | +Set some global options for the function used to get data Using subset |
| 22 | +of data that’s publicly available so don’t need API key. Will need to |
| 23 | +find and use API key to access other data. |
| 24 | + |
| 25 | +``` r |
| 26 | +options(betydb_url = "https://terraref.ncsa.illinois.edu/bety/", |
| 27 | + betydb_api_version = 'beta', |
| 28 | + betydb_key = '9999999999999999999999999999999999999999') |
| 29 | +``` |
| 30 | + |
| 31 | +Using traits R package. Function is betydb\_query, works for several |
| 32 | +datasets including Terra Ref. |
| 33 | + |
| 34 | +Pulling data from Season 4, only a subset using limit because there’s a |
| 35 | +lot of it. |
| 36 | + |
| 37 | +``` r |
| 38 | +library(traits) |
| 39 | +``` |
| 40 | + |
| 41 | + ## Registered S3 method overwritten by 'httr': |
| 42 | + ## method from |
| 43 | + ## as.character.form_file crul |
| 44 | + |
| 45 | + ## Registered S3 method overwritten by 'hoardr': |
| 46 | + ## method from |
| 47 | + ## print.cache_info httr |
| 48 | + |
| 49 | +``` r |
| 50 | +season_4 <- betydb_query(sitename = "~Season 4", limit = 1000) |
| 51 | +``` |
| 52 | + |
| 53 | +Look at dataframe. |
| 54 | + |
| 55 | +Look at just traits available, canopy\_height is one. Using data |
| 56 | +cleaning R package. |
| 57 | + |
| 58 | +``` r |
| 59 | +library(dplyr) |
| 60 | +``` |
| 61 | + |
| 62 | + ## |
| 63 | + ## Attaching package: 'dplyr' |
| 64 | + |
| 65 | + ## The following objects are masked from 'package:stats': |
| 66 | + ## |
| 67 | + ## filter, lag |
| 68 | + |
| 69 | + ## The following objects are masked from 'package:base': |
| 70 | + ## |
| 71 | + ## intersect, setdiff, setequal, union |
| 72 | + |
| 73 | +``` r |
| 74 | +season_4 %>% |
| 75 | + distinct(trait) %>% |
| 76 | + print(n = Inf) |
| 77 | +``` |
| 78 | + |
| 79 | + ## # A tibble: 40 x 1 |
| 80 | + ## trait |
| 81 | + ## <chr> |
| 82 | + ## 1 canopy_height |
| 83 | + ## 2 relative_chlorophyll |
| 84 | + ## 3 absorbance_730 |
| 85 | + ## 4 leaf_temperature |
| 86 | + ## 5 vH+ |
| 87 | + ## 6 light_intensity_PAR |
| 88 | + ## 7 SPAD_880 |
| 89 | + ## 8 SPAD_850 |
| 90 | + ## 9 SPAD_650 |
| 91 | + ## 10 leaf_angle_clamp_position |
| 92 | + ## 11 ambient_humidity |
| 93 | + ## 12 leaf_thickness |
| 94 | + ## 13 SPAD_730 |
| 95 | + ## 14 SPAD_605 |
| 96 | + ## 15 SPAD_530 |
| 97 | + ## 16 RFd |
| 98 | + ## 17 qP |
| 99 | + ## 18 qL |
| 100 | + ## 19 NPQt |
| 101 | + ## 20 Fs |
| 102 | + ## 21 absorbance_940 |
| 103 | + ## 22 absorbance_880 |
| 104 | + ## 23 absorbance_605 |
| 105 | + ## 24 absorbance_530 |
| 106 | + ## 25 PhiNPQ |
| 107 | + ## 26 PhiNO |
| 108 | + ## 27 roll |
| 109 | + ## 28 absorbance_850 |
| 110 | + ## 29 SPAD_420 |
| 111 | + ## 30 LEF |
| 112 | + ## 31 FoPrime |
| 113 | + ## 32 FmPrime |
| 114 | + ## 33 Phi2 |
| 115 | + ## 34 leaf_temperature_differential |
| 116 | + ## 35 ECSt |
| 117 | + ## 36 gH+ |
| 118 | + ## 37 FvP/FmP |
| 119 | + ## 38 proximal_air_temperature |
| 120 | + ## 39 pitch |
| 121 | + ## 40 absorbance_650 |
| 122 | + |
| 123 | +Want to look at just the trait values for this trait during a more |
| 124 | +recent season, season 6. Use same function but with another argument, |
| 125 | +trait. |
| 126 | + |
| 127 | +``` r |
| 128 | +canopy_height <- betydb_query(trait = "canopy_height", |
| 129 | + sitename = "~Season 6", |
| 130 | + limit = 250) |
| 131 | +``` |
| 132 | + |
| 133 | +Want to plot canopy height across time, first have to get date into |
| 134 | +correct format for plotting. Use function from another R package to |
| 135 | +create new date column with correct formatted date. |
| 136 | + |
| 137 | +``` r |
| 138 | +library(lubridate) |
| 139 | +``` |
| 140 | + |
| 141 | + ## |
| 142 | + ## Attaching package: 'lubridate' |
| 143 | + |
| 144 | + ## The following object is masked from 'package:base': |
| 145 | + ## |
| 146 | + ## date |
| 147 | + |
| 148 | +``` r |
| 149 | +canopy_height <- canopy_height %>% |
| 150 | + mutate(formatted_date = ymd_hms(raw_date)) |
| 151 | +``` |
| 152 | + |
| 153 | +Plot canopy data. Using ggplot package. |
| 154 | + |
| 155 | +Plot newly formatted date column on x-axis and canopy height value, in |
| 156 | +mean column on y. |
| 157 | + |
| 158 | +``` r |
| 159 | +library(ggplot2) |
| 160 | +ggplot(data = canopy_height, aes(x = formatted_date, y = mean)) + |
| 161 | + geom_point() |
| 162 | +``` |
| 163 | + |
| 164 | +<!-- --> |
| 165 | + |
| 166 | +Add axis labels, finding units from dataframe. |
| 167 | + |
| 168 | +``` r |
| 169 | +ggplot(data = canopy_height, aes(x = formatted_date, y = mean)) + |
| 170 | + geom_point() + |
| 171 | + labs(x = "Date", y = "Plant height (cm)") |
| 172 | +``` |
| 173 | + |
| 174 | +<!-- --> |
| 175 | + |
| 176 | +How to get API key: |
| 177 | + |
| 178 | +1. Log into betydb.org |
| 179 | +2. Go to data/users |
| 180 | +3. See your account there with API key listed |
| 181 | + |
| 182 | +### Weather download |
| 183 | + |
| 184 | +No special R package for getting weather data. Pull directly from |
| 185 | +Clowder. |
| 186 | + |
| 187 | +Data is in JSON format, so use this R package to pull down data and turn |
| 188 | +into R data frame structure. |
| 189 | + |
| 190 | +Create URL based on what part of data we want. Stream ID specifies |
| 191 | +weather station, and then since and until for date range. Getting all |
| 192 | +weather data for 2017. |
| 193 | + |
| 194 | +``` r |
| 195 | +library(jsonlite) |
| 196 | +weather <- fromJSON('https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=46431&since=2017-01-02&until=2017-01-31', flatten = FALSE) |
| 197 | +``` |
| 198 | + |
| 199 | +Pulling out subset of data called properties. Handful of weather data. |
| 200 | + |
| 201 | +Then same reformatting of date as before. Using end\_time column from |
| 202 | +weather dataset. |
| 203 | + |
| 204 | +``` r |
| 205 | +weather <- weather$properties %>% |
| 206 | + mutate(formatted_date = ymd_hms(weather$end_time)) |
| 207 | +``` |
| 208 | + |
| 209 | +Plot single variable, air temperature, across time. Turns out data is |
| 210 | +only for month of January. |
| 211 | + |
| 212 | +``` r |
| 213 | +ggplot(data = weather, aes(x = formatted_date, y = air_temperature)) + |
| 214 | + geom_point() + |
| 215 | + labs(x = "Date", y = "Temperature (K)") |
| 216 | +``` |
| 217 | + |
| 218 | +<!-- --> |
| 219 | + |
| 220 | +If we want to easily plot all 8 of the weather variables, need to |
| 221 | +rearrange data. It’s in wide format, need it in long. |
| 222 | + |
| 223 | +Remove a couple of unneeded columns. Then turn variable headers into a |
| 224 | +column and put their values in weather\_value column. |
| 225 | + |
| 226 | +``` r |
| 227 | +library(tidyr) |
| 228 | +weather_long <- weather %>% |
| 229 | + select(-source, -source_file) %>% |
| 230 | + gather(weather_variable, weather_value, -formatted_date) |
| 231 | +``` |
| 232 | + |
| 233 | +Can now easily plot all of them using |
| 234 | +ggplot. |
| 235 | + |
| 236 | +``` r |
| 237 | +ggplot(data = weather_long, aes(x = formatted_date, y = weather_value)) + |
| 238 | + geom_point() + |
| 239 | + facet_wrap(~weather_variable, scales = "free_y") + |
| 240 | + labs(x = "Date", y = "Weather variable") |
| 241 | +``` |
| 242 | + |
| 243 | +<!-- --> |
0 commit comments