PecanProject
diff --git a/‎vignettes/common_analyses.qmd‎
Lines changed: 183 additions & 0 deletions b/‎vignettes/common_analyses.qmd‎
Lines changed: 183 additions & 0 deletions
diff --git a/‎vignettes/getting_started.qmd‎
Lines changed: 193 additions & 0 deletions b/‎vignettes/getting_started.qmd‎
Lines changed: 193 additions & 0 deletions
@@ -0,0 +1,183 @@
+---
+title: "Common Analyses with betydata"
+vignette: >
+  %\VignetteIndexEntry{Common Analyses with betydata}
+  %\VignetteEngine{quarto::html}
+  %\VignetteEncoding{UTF-8}
+---
+
+::: {.callout-note}
+## What you will learn
+
+- How to extract and summarize yield data for specific genera
+- How to link management practices (fertilization, planting) to yield observations
+- Patterns for site-level aggregation, author-based queries, and variable lookups
+:::
+
+## Setup
+
+```{r}
+library(betydata)
+library(dplyr)
+```
+
+## Extracting Yield Data for a Genus {#sec-yields}
+
+A common starting point is pulling yield observations for a particular genus and summarizing them. The `Ayield` trait represents above-ground annual yield in Mg/ha.
+
+```{r}
+miscanthus_yields <- traitsview |>
+  filter(
+    genus == "Miscanthus",
+    trait == "Ayield"
+  ) |>
+  select(id, mean, date, sitename, scientificname)
+
+miscanthus_yields
+
+nrow(miscanthus_yields)
+```
+
+::: {.callout-tip}
+## Tibble Printing
+
+All tables are tibbles, which display the first 10 rows by default. With key columns ordered first (`trait`, `mean`, `units`, `scientificname`, `genus`), the default output is immediately informative without needing `head()` or column subsetting.
+:::
+
+## Working with Management Practices {#sec-management}
+
+Management practices (planting dates, fertilization rates, harvest methods) are stored in the `managements` table and linked to experimental treatments through the `managements_treatments` junction table. This linkage connects management details to yield observations in `traitsview`.
+
+```{r}
+mgmt_treat <- managements_treatments |>
+  left_join(
+    managements |> select(id, mgmttype, level, units, date),
+    by = c("management_id" = "id")
+  )
+
+grass_yields <- traitsview |>
+  filter(
+    genus %in% c("Miscanthus", "Panicum"),
+    trait == "Ayield"
+  ) |>
+  left_join(mgmt_treat, by = "treatment_id", relationship = "many-to-many")
+
+grass_yields |>
+  filter(!is.na(mgmttype)) |>
+  count(genus, mgmttype, sort = TRUE)
+```
+
+## Nitrogen Fertilization Rates {#sec-nitrogen}
+
+Extracting nitrogen application rates and joining them with yield data enables exploration of yield--nitrogen relationships. Nitrogen management is recorded as `fertilizer_N` or `fertilizer_N_rate` in the `mgmttype` column.
+
+```{r}
+nitrogen_rates <- managements |>
+  filter(mgmttype %in% c("fertilizer_N", "fertilizer_N_rate")) |>
+  left_join(
+    managements_treatments |> select(management_id, treatment_id),
+    by = c("id" = "management_id")
+  ) |>
+  select(treatment_id, nrate = level, units)
+
+yields_with_n <- traitsview |>
+  filter(
+    trait == "Ayield",
+    genus %in% c("Miscanthus", "Panicum")
+  ) |>
+  left_join(nitrogen_rates, by = "treatment_id", relationship = "many-to-many")
+
+yields_with_n |>
+  filter(!is.na(nrate)) |>
+  summarise(
+    n = n(),
+    mean_N = round(mean(nrate, na.rm = TRUE), 1),
+    mean_yield = round(mean(mean, na.rm = TRUE), 1),
+    .by = genus
+  ) |>
+  knitr::kable(col.names = c("Genus", "N obs", "Mean N rate", "Mean Yield (Mg/ha)"))
+```
+
+## Site-Level Aggregation {#sec-sites}
+
+Aggregating trait data by site is useful for spatial analysis and mapping data density across research locations.
+
+```{r}
+#| label: tbl-site-summary
+#| tbl-cap: "Top research sites by number of records"
+site_summary <- traitsview |>
+  filter(!is.na(lat), !is.na(lon)) |>
+  summarise(
+    n_records = n(),
+    n_traits = n_distinct(trait),
+    n_species = n_distinct(species_id),
+    .by = c(site_id, sitename, lat, lon)
+  )
+
+site_summary |>
+  arrange(desc(n_records)) |>
+  head(15) |>
+  knitr::kable()
+```
+
+::: {.callout-tip}
+## Geographic Data
+
+All sites with coordinates have `lat` and `lon` columns in both `traitsview` and the `sites` table. The `sites` table additionally contains `mat` (mean annual temperature) and `map` (mean annual precipitation) for sites where climate data is available.
+:::
+
+## Finding Data by Author {#sec-author}
+
+```{r}
+lebauer_data <- traitsview |>
+  filter(grepl("LeBauer", author, ignore.case = TRUE))
+
+lebauer_data |>
+  count(trait, author, citation_year, sort = TRUE)
+```
+
+## Most Data-Rich Citations {#sec-citations}
+
+```{r}
+#| label: tbl-citations
+#| tbl-cap: "Top 10 citations by number of records"
+traitsview |>
+  count(citation_id, author, citation_year, sort = TRUE) |>
+  head(10) |>
+  knitr::kable()
+```
+
+## Variable and Trait Lookups {#sec-variables}
+
+The `variables` table provides units, descriptions, and valid ranges for each measured trait. This is useful for understanding what a trait measures and checking whether observed values are within expected bounds.
+
+```{r}
+variables |>
+  filter(name %in% c("SLA", "Vcmax", "leaf_respiration_rate_m2", "Ayield")) |>
+  select(name, units, description, min, max) |>
+  knitr::kable()
+```
+
+## Performance {#sec-performance}
+
+Since all tables are loaded in memory as R data frames, filtering and joining operations run at in-memory speed with no network overhead.
+
+```{r}
+system.time({
+  result <- traitsview |>
+    filter(
+      genus %in% c("Miscanthus", "Panicum", "Populus"),
+      trait %in% c("SLA", "Vcmax", "Ayield"),
+      checked == 1
+    ) |>
+    summarise(
+      n = n(),
+      mean = mean(mean, na.rm = TRUE),
+      .by = c(genus, trait)
+    )
+})
+```
+
+## References
+
+- LeBauer, D. S., et al. (2018). BETYdb: a yield, trait, and ecosystem service database applied to second-generation bioenergy feedstock production. *GCB Bioenergy*. [doi:10.1111/gcbb.12420](https://doi.org/10.1111/gcbb.12420)
@@ -0,0 +1,193 @@
+---
+title: "Getting Started with betydata"
+vignette: >
+  %\VignetteIndexEntry{Getting Started with betydata}
+  %\VignetteEngine{quarto::html}
+  %\VignetteEncoding{UTF-8}
+---
+
+::: {.callout-note}
+## What you will learn
+
+- What data is available in betydata and how the 16 tables relate to each other
+- How to explore trait and yield observations using dplyr
+- Key concepts: traits, yields, QA/QC flags, and Plant Functional Types
+:::
+
+## What is betydata?
+
+The `betydata` package provides offline access to public data from [BETYdb](https://betydb.org), the Biofuel Ecophysiological Traits and Yields database. BETYdb is a centralized repository of plant trait measurements and crop yield data used in ecosystem modeling and agricultural research.
+
+A **trait** is a measurable characteristic of a plant -- for example, Specific Leaf Area (SLA, m2/kg), maximum carboxylation rate (Vcmax, umol/m2/s), or leaf nitrogen content (%). A **yield** is a measure of crop production per unit area (typically Mg/ha). Together, traits and yields form the foundation of ecosystem model parameterization and agricultural research.
+
+## Loading the Package
+
+```{r}
+library(betydata)
+library(dplyr)
+```
+
+## Data Architecture {#sec-architecture}
+
+The package contains **16 tables** organized in three tiers:
+
+```{r}
+#| label: tbl-tables
+#| tbl-cap: "All tables available in betydata"
+data(package = "betydata")$results[, c("Item", "Title")] |>
+  as.data.frame() |>
+  knitr::kable()
+```
+
+::: {.callout-tip}
+## Data Model
+
+The tables follow a relational structure:
+
+- **`traitsview`** is the primary denormalized table (pre-joined for convenience)
+- **Metadata tables** (`species`, `sites`, `variables`, `citations`, etc.) provide reference data
+- **Relationship tables** (`pfts_species`, `pfts_priors`, etc.) are many-to-many junction tables
+
+You can use `traitsview` for most analyses without joining anything. The metadata and relationship tables are available when you need additional detail or custom aggregations.
+:::
+
+## The Primary Table: traitsview {#sec-traitsview}
+
+The `traitsview` table is a denormalized view combining traits and yields with associated metadata. Key analytical columns are placed first for convenient interactive use:
+
+```{r}
+traitsview
+```
+
+### Key Columns {#sec-columns}
+
+| Column           | Description                                      | Example Values            |
+|------------------|--------------------------------------------------|---------------------------|
+| `trait`          | Variable name                                    | SLA, Vcmax, Ayield        |
+| `mean`           | Observed value                                   | 22.5, 38.1                |
+| `units`          | Measurement units                                | m2/kg, umol/m2/s          |
+| `scientificname` | Full species name                                | *Miscanthus x giganteus*  |
+| `genus`          | Genus                                            | Miscanthus, Panicum       |
+| `sitename`       | Research site                                    | Energy Farm, Urbana IL    |
+| `author`         | Citation author                                  | Heaton 2008               |
+| `checked`        | QA/QC status (0 = unchecked, 1 = verified)       | 0, 1                      |
+
+## Basic Exploration
+
+```{r}
+#| label: tbl-trait-counts
+#| tbl-cap: "Top 15 most common traits in betydata"
+traitsview |>
+  count(trait, sort = TRUE) |>
+  head(15) |>
+  knitr::kable()
+```
+
+## Data Quality: The `checked` Column {#sec-checked}
+
+::: {.callout-important}
+## Quality Control
+
+The `checked` column indicates data verification status:
+
+- **`1`** = Verified by an independent reviewer
+- **`0`** = Not yet reviewed (use with appropriate caution)
+- **`-1`** = Flagged as incorrect (**excluded** from this package)
+
+All data in this package is public (BETYdb `access_level = 4`).
+:::
+
+```{r}
+table(traitsview$checked, useNA = "ifany")
+
+verified <- traitsview |>
+  filter(checked == 1)
+nrow(verified)
+```
+
+## Support Tables {#sec-support}
+
+### Species Taxonomy
+
+The `species` table contains `r format(nrow(species), big.mark = ",")` entries with full taxonomic information:
+
+```{r}
+species |>
+  select(id, scientificname, genus, commonname)
+```
+
+### Variables (Trait Definitions)
+
+The `variables` table documents units, descriptions, and valid ranges for each measured trait:
+
+```{r}
+variables |>
+  filter(name %in% c("SLA", "Vcmax", "leaf_respiration_rate_m2", "Ayield")) |>
+  select(name, units, description)
+```
+
+### Sites
+
+```{r}
+sites_with_climate <- sites |>
+  filter(!is.na(mat), !is.na(map))
+nrow(sites_with_climate)
+```
+
+## Example: Bioenergy Crop Yields {#sec-bioenergy}
+
+```{r}
+#| label: tbl-bioenergy
+#| tbl-cap: "Yield summary for key bioenergy genera"
+bioenergy_genera <- c("Miscanthus", "Panicum", "Populus", "Salix", "Saccharum")
+
+yields <- traitsview |>
+  filter(
+    trait == "Ayield",
+    genus %in% bioenergy_genera,
+    !is.na(mean)
+  ) |>
+  select(genus, mean, units, sitename, author, citation_year, lat, lon)
+
+yields |>
+  summarise(
+    n = n(),
+    mean_yield = round(mean(mean, na.rm = TRUE), 1),
+    sd_yield = round(sd(mean, na.rm = TRUE), 1),
+    .by = genus
+  ) |>
+  knitr::kable(col.names = c("Genus", "N", "Mean Yield (Mg/ha)", "SD"))
+```
+
+## Working with Plant Functional Types (PFTs) {#sec-pfts}
+
+::: {.callout-note}
+## What is a PFT?
+
+A **Plant Functional Type** groups species with similar ecological characteristics for ecosystem modeling. Instead of parameterizing models for each species individually, PFTs like "temperate deciduous trees" or "C4 grasses" define shared parameter distributions. This approach is essential when species-level data is sparse and makes modeling tractable at large scales.
+:::
+
+```{r}
+miscanthus_sp <- species |>
+  filter(genus == "Miscanthus") |>
+  pull(id)
+
+pfts_species |>
+  filter(specie_id %in% miscanthus_sp) |>
+  left_join(pfts |> select(id, name), by = c("pft_id" = "id")) |>
+  distinct(name)
+```
+
+## Next Steps
+
+| Vignette                       | Description                                   |
+|--------------------------------|-----------------------------------------------|
+| `vignette("common_analyses")`  | Common analysis patterns with dplyr           |
+| `vignette("pfts-priors")`      | Working with PFTs and Bayesian priors         |
+| `vignette("manuscript")`       | Reproduce analyses from LeBauer et al. (2018) |
+
+## References
+
+- LeBauer, D. S., et al. (2018). BETYdb: a yield, trait, and ecosystem service database applied to second-generation bioenergy feedstock production. *GCB Bioenergy*. [doi:10.1111/gcbb.12420](https://doi.org/10.1111/gcbb.12420)
+- LeBauer, D. S., et al. (2013). Facilitating feedbacks between field measurements and ecosystem models. *Ecological Monographs*, 83(2), 133--154.
+- BETYdb documentation: <https://betydb.org>