bios2.github.io/_freeze/posts/2022-03-22-bayes-for-the-busy-ecologist/index/execute-results/html.json at efdf2f1957023bc9a8a61cba6d0faf11809d6878 · bios2/bios2.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
  "hash": "f07663379a35eeae870d2aa0e2428ede",
  "result": {
    "markdown": "---\ntitle: \"Bayes for the Busy Ecologist\"\ndescription: \"This workshop presents one idea of a complete workflow for applied Bayesian statistics with real-world models that are actually used by biodiversity scientists.\"\nauthor:\n  - name: \"Andrew MacDonald\"\n    affiliation: \"BIOS² and Université de Sherbrooke\"\ndate: \"22-03-2022\"\nimage: image.jpg\ncategories: [Technical, EN]\ntoc: true\nnumber-sections: true\nnumber-depth: 1\n---\n\n\n\n\n# Applied Bayesian models for working ecologists {-}\n\nThis training covers how to write down a model, how to translate that into computer code, how to fit it to data and finally, how to work with the resulting posterior distribution. We’ll use Stan, which is a language for writing Bayesian models, and practice with some of the tools that help us work with Stan: rstanarm, brms, tidybayes, shinystan.\n\nThis training is intended for experienced users of Bayesian tools and for complete beginners who want to use such tools.\n\nThis 6 to 8h online workshop was conducted in 4 sessions: March 22, 24, 29, 31, 2022, from 11AM – 1 PM Pacific Time / 12 – 2 PM Mountain Time / 2-4 PM Eastern Time. The training was built and presented by Dr. Andrew MacDonald in English with bilingual support throughout.\n\nAndrew MacDonald is the Training Coordinator of the BIOS² program. He is a quantitative ecologist who works mostly in R and a well-experienced trainer in teaching quantitative and computational methods. He is currently a research professional at Université de Sherbrooke.\n\n# Day 1\n\n# Day 2: Hierarchical and nonlinear models\n\nIn which we discuss many groups and curving lines.\n\n## Outline\n\n* Return to previous model: Poisson regression \n* Panel regression version of this model\n* Bayesian workflow\n* Brief foray into moment matching\n* Nonlinear model\n* Nonlinear model with random effects\n\n## Quick review\n\n### Bird masses\n\nThis example is based on work by Marie-Eve at UdeS! \n\nWe imagine a model like the following: \n\n$$\n\\begin{align}\n\\text{Nestlings}_i & \\sim \\text{Poisson}(\\lambda_i) \\\\\n\\text{log}(\\lambda_i) &= \\beta_0 + \\beta_1 \\times \\text{Mass}_i \\\\\n\\beta_0 & \\sim \\text{Normal}(??, ??) \\\\\n\\beta_1 & \\sim \\text{Normal}(??, ??)\n\\end{align}\n$$\n\n\n$i$ keeps track of which bird we are talking about. You can think of it as \"bird number i\"\n\nNote: We could also write the model like this:\n\n$$\n\\begin{align}\n\\text{Nestlings}_i & \\sim \\text{Poisson}(e^{\\beta_0} \\times e^{\\beta_1 \\times \\text{Mass}_i}) \\\\\n\\beta_0 & \\sim \\text{Normal}(??, ??) \\\\\n\\beta_1 & \\sim \\text{Normal}(??, ??)\n\\end{align}\n$$\n\n### Centering variables\n\nCentering variables is one of the most important things we can do to help our models be more interpretable. This also helps us to set good priors. \n\nCentering a variable means to subtract the mean from the variable:\n\n$$\n\\begin{align}\n\\text{Nestlings}_i & \\sim \\text{Poisson}(\\lambda_i) \\\\\n\\text{log}(\\lambda_i) &= \\beta_0 + \\beta_1 \\times (\\text{Mass}_i - \\overline{\\text{Mass}}) \\\\\n\\beta_0 & \\sim \\text{Normal}(??, ??) \\\\\n\\beta_1 & \\sim \\text{Normal}(??, ??)\n\\end{align}\n$$\n\n*Question* How does this change the meaning of $\\beta_0$ and/or $\\beta_1$, if at all? (Hint: what will be the equation for a bird who has exactly average mass?) \n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(1234)\n\nn_birds <- 15\navg_nestlings_at_avg_mass <- log(4.2)\neffect_of_one_gram <- .2\n\nmother_masses_g <- rnorm(n_birds, mean = 15, sd = 3)\navg_mother_mass <- mean(mother_masses_g)\n\nlog_average_nestlings <- avg_nestlings_at_avg_mass + \n  effect_of_one_gram * (mother_masses_g - avg_mother_mass)\n\nnestlings <- rpois(n = n_birds, lambda = exp(log_average_nestlings))\n```\n:::\n\n\nPlot these to get an idea of it:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsuppressPackageStartupMessages(library(tidyverse))\nimaginary_birds <- tibble(mother_masses_g, nestlings)\n\nggplot(imaginary_birds, aes(x = mother_masses_g, y = nestlings)) + \n  geom_point()\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-1-1.png){width=672}\n:::\n:::\n\n\n*NOTE* We can also fit this very same model by frequentist statistics, using `lm`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncoef(glm(nestlings ~ 1 + I(mother_masses_g - mean(mother_masses_g)), family = \"poisson\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n                               (Intercept) \n                                 1.4138103 \nI(mother_masses_g - mean(mother_masses_g)) \n                                 0.1727791 \n```\n:::\n\n```{.r .cell-code}\n# compare to known values\navg_nestlings_at_avg_mass\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1.435085\n```\n:::\n\n```{.r .cell-code}\neffect_of_one_gram\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.2\n```\n:::\n:::\n\n\n### Bayesian workflow: define a model and priors\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(brms)\n\nimaginary_birds_centered <- imaginary_birds |> \n  mutate(mother_mass_g_cen = mother_masses_g - mean(mother_masses_g))\n\nbird_form <- bf(nestlings ~ 1 + mother_mass_g_cen, family = poisson(link = \"log\"))\n\nget_prior(bird_form, data = imaginary_birds_centered)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n                  prior     class              coef group resp dpar nlpar lb ub\n                 (flat)         b                                              \n                 (flat)         b mother_mass_g_cen                            \n student_t(3, 1.4, 2.5) Intercept                                              \n       source\n      default\n (vectorized)\n      default\n```\n:::\n:::\n\n\nWe set a prior for each parameter. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nbird_priors <- c(\n  prior(normal(1, .5), class = \"Intercept\"),\n  prior(normal(.1, .1), class = \"b\", coef = \"mother_mass_g_cen\")\n)\n```\n:::\n\n\n#### Prior predictive checks\n\n\n::: {.cell}\n\n```{.r .cell-code}\nprior_predictions <- brm(bird_form,\n                         data = imaginary_birds_centered,\n                         prior = bird_priors, \n                         sample_prior = \"only\", \n                         file = \"bird_model\",\n                         file_refit = \"on_change\",\n                         refresh = FALSE)\n```\n:::\n\n\nPlot a few of these:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidybayes)\nimaginary_birds_centered |> \n  add_predicted_draws(prior_predictions, ndraws = 6, seed = 4321) |> \n  ggplot(aes(x = mother_masses_g, y = .prediction)) + geom_point() + facet_wrap(~.draw)\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-6-1.png){width=672}\n:::\n:::\n\n\n*Question* are we satisfied with these priors?\n\n#### Fit to the data\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbird_posterior <- update(prior_predictions, sample_prior = \"yes\", \n                         file = \"bird_posterior\", \n                         file_refit = \"on_change\", refresh = FALSE)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nThe desired updates require recompiling the model\n```\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nsummary(bird_posterior)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n Family: poisson \n  Links: mu = log \nFormula: nestlings ~ 1 + mother_mass_g_cen \n   Data: imaginary_birds_centered (Number of observations: 15) \n  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;\n         total post-warmup draws = 4000\n\nPopulation-Level Effects: \n                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS\nIntercept             1.39      0.13     1.14     1.64 1.00     1979     2070\nmother_mass_g_cen     0.16      0.04     0.07     0.25 1.00     2376     2463\n\nDraws were sampled using sampling(NUTS). For each parameter, Bulk_ESS\nand Tail_ESS are effective sample size measures, and Rhat is the potential\nscale reduction factor on split chains (at convergence, Rhat = 1).\n```\n:::\n\n```{.r .cell-code}\nknitr::kable(head(tidybayes::tidy_draws(bird_posterior)))\n```\n\n::: {.cell-output-display}\n| .chain| .iteration| .draw| b_Intercept| b_mother_mass_g_cen| prior_Intercept| prior_b_mother_mass_g_cen|    lprior|      lp__| accept_stat__| stepsize__| treedepth__| n_leapfrog__| divergent__| energy__|\n|------:|----------:|-----:|-----------:|-------------------:|---------------:|-------------------------:|---------:|---------:|-------------:|----------:|-----------:|------------:|-----------:|--------:|\n|      1|          1|     1|    1.450523|           0.1357033|       0.7548442|                 0.2027851| 0.6881775| -29.18266|     0.9686242|  0.8998242|           2|            7|           0| 29.84270|\n|      1|          2|     2|    1.415657|           0.1138594|       0.7622771|                 0.0676872| 0.8027096| -29.58861|     0.9127170|  0.8998242|           2|            3|           0| 29.93907|\n|      1|          3|     3|    1.456546|           0.1777444|       0.7965764|                 0.1959839| 0.4387761| -29.24064|     0.9937692|  0.8998242|           2|            3|           0| 29.87044|\n|      1|          4|     4|    1.440214|           0.1352149|       0.7316238|                 0.1870379| 0.7082742| -29.17389|     0.9942078|  0.8998242|           2|            7|           0| 29.34977|\n|      1|          5|     5|    1.436886|           0.1975367|       1.2892340|                 0.0588215| 0.3004459| -29.50809|     0.9286261|  0.8998242|           2|            7|           0| 29.73433|\n|      1|          6|     6|    1.450670|           0.0961187|       0.4208806|                 0.1258187| 0.7508956| -30.07357|     0.9696699|  0.8998242|           2|            3|           0| 30.30790|\n:::\n:::\n\n\nHow do our priors and posteriors compare?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(ggridges)\ntidybayes::tidy_draws(bird_posterior) |> \n  select(.draw, b_Intercept:prior_b_mother_mass_g_cen) |> \n  pivot_longer(-.draw) |> \n  ggplot(aes(x = value, y = name)) + geom_density_ridges()\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-9-1.png){width=672}\n:::\n:::\n\n\nCan we draw the regression line? \n\n\n::: {.cell}\n\n```{.r .cell-code}\naverage_mom <- mean(mother_masses_g)\n\nrange(imaginary_birds_centered$mother_mass_g_cen)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] -6.025202  4.265214\n```\n:::\n\n```{.r .cell-code}\ntibble(mother_mass_g_cen = modelr::seq_range(imaginary_birds_centered$mother_mass_g_cen, \n                                             n = 10)) |> \n  tidybayes::add_epred_draws(bird_posterior) |> \n  ungroup() |> \n  ggplot(aes(x = average_mom + mother_mass_g_cen, y = .epred)) + \n  stat_lineribbon() + \n  scale_fill_brewer(palette = \"Greens\", direction = -1) + \n  geom_point(aes(x = mother_masses_g, y = nestlings),\n             data = imaginary_birds_centered, pch = 21,\n             fill = \"orange\", size = 3)\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-10-1.png){width=672}\n:::\n:::\n\n\nLet's also try drawing the prediction intervals.\n\n\n::: {.cell}\n\n```{.r .cell-code}\naverage_mom <- mean(mother_masses_g)\n\nrange(imaginary_birds_centered$mother_mass_g_cen)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] -6.025202  4.265214\n```\n:::\n\n```{.r .cell-code}\ntibble(mother_mass_g_cen = modelr::seq_range(imaginary_birds_centered$mother_mass_g_cen, \n                                             n = 10)) |> \n  tidybayes::add_predicted_draws(bird_posterior) |> \n  ungroup() |> \n  ggplot(aes(x = average_mom + mother_mass_g_cen, y = .prediction)) + \n  stat_lineribbon() + \n  scale_fill_brewer(palette = \"Greens\", direction = -1) + \n  geom_point(aes(x = mother_masses_g, y = nestlings),\n             data = imaginary_birds_centered, pch = 21,\n             fill = \"orange\", size = 3)\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-11-1.png){width=672}\n:::\n:::\n\n\nOther checks we can do:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbird_posterior_onlyparam <- update(prior_predictions, sample_prior = \"no\", \n                         file = \"bird_posterior\", \n                         file_refit = \"on_change\", refresh = FALSE)\n\nshinystan::launch_shinystan(bird_posterior_onlyparam)\n```\n:::\n\n\n\n## Multilevel models\n\nBased on the awesome vignette for vignette for `tidybayes`\n\nWe begin by sampling some data from five different \"conditions\":\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(modelr)\nset.seed(5)\nn <- 10\nn_condition <- 5\nABC <-\n  data_frame(\n    condition = rep(c(\"A\", \"B\", \"C\", \"D\", \"E\"), n),\n    response = rnorm(n * 5, c(0, 1, 2, 1, -1), 0.5)\n  )\n\nABC %>%\n  ggplot(aes(y = condition, x = response)) +\n  geom_point(pch = 21, size = 4, stroke = 1.4, fill = \"#41b6c4\")\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/setup-vig-1.png){width=672}\n:::\n:::\n\n\nAnd by fitting a model to these data, with varying intercepts for each group:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nm <- brm(\n  response ~ (1 | condition), data = ABC, \n  control = list(adapt_delta = .99),\n  prior = c(\n    prior(normal(0, 1), class = Intercept),\n    prior(student_t(3, 0, 1), class = sd),\n    prior(student_t(3, 0, 1), class = sigma)\n  )\n)\n```\n:::\n\n\nAn easy way to visualize these results is with a _ridgeline plot_ as above\n\n\n::: {.cell}\n\n```{.r .cell-code}\nABC %>%\n  modelr::data_grid(condition) %>%\n  tidybayes::add_predicted_draws(m) %>%\n  ggplot(aes(x = .prediction, y = condition)) +\n  geom_density_ridges(fill = \"#41b6c4\") + \n  theme_minimal()\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nPicking joint bandwidth of 0.102\n```\n:::\n\n::: {.cell-output-display}\n![](index_files/figure-html/m_plot-1.png){width=672}\n:::\n:::\n\n\nAlright. This used the simple vanilla option, `add_predicted_samples(m)`. This uses the default options for making predictions, which recall is \"NULL (default), include all group-level effects\". If you set `add_predicted_samples(m, re_formula = NULL)`, you'll get exactly the same figure.  \n\nSo we can see that to \"include\" an effect is to take the actual estimated intercepts for each _specific group we studied_ and use them to make new predictions for the same groups. This is **Case 1** from McElreath's list (though in this case, because we only have groups and nothing else, Case 1 and 2 are the same). \n\nWe can also say the **exact same thing** using a formula: \n\n\n::: {.cell}\n\n```{.r .cell-code}\nABC %>%\n  data_grid(condition) %>%\n  add_predicted_draws(m, re_formula = ~(1|condition)) %>%\n  ggplot(aes(x = .prediction, y = condition)) +\n  geom_density_ridges(fill = \"#41b6c4\") +  \n  theme_minimal()\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nPicking joint bandwidth of 0.1\n```\n:::\n\n::: {.cell-output-display}\n![](index_files/figure-html/m_re_condition-1.png){width=672}\n:::\n:::\n\n\nThat's right, there are three ways to say the exact same thing: say nothing, say `NULL`, or say the original \"random effects\" formula^[this impulse in R to \"help your users\" by making it possible to say a great deal by saying almost nothing is... actually pretty counterproductive, I'd argue? But that's a different post]. You go with what you feel in your heart is right, but I prefer the formula.   \n\nIn all three cases, we are using the model to predict the means for the groups in our varying-intercepts model. This is what the documentation means by \"including\" these varying intercepts.\n\n### Squishing those random effects\n\nOK, so that was three separate ways to make predictions for the _same_ groups. What else can we do? Let's try that thing with the `NA` argument, which means \"include no group-level effects\":\n\n\n::: {.cell}\n\n```{.r .cell-code}\nABC %>%\n  data_grid(condition) %>%\n  add_predicted_draws(m, re_formula = NA,\n                        n = 2000) %>%\n  ggplot(aes(x = .prediction, y = condition)) +\n  geom_density_ridges(fill = \"#41b6c4\") +    theme_minimal()\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nPicking joint bandwidth of 0.142\n```\n:::\n\n::: {.cell-output-display}\n![](index_files/figure-html/m_plot_NA_TRUE-and_zero-1.png){width=672}\n:::\n:::\n\n\nAh, so if you do this, all the groups come out the same! But if they're all the same, what do they represent? It seems reasonable that they represent the model's intercept, as if the varying intercepts were all 0. Let's calculate predicitons that ignore the varying effects -- that is, using only the model intercept and the standard deviation of the response -- using a bit of [handy `purrr` magic]^[no magic required! `rnorm` is already vectorized]:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nm %>% \n  spread_draws(b_Intercept, sigma) %>% \n  mutate(prediction = rnorm(length(b_Intercept), b_Intercept, sigma),\n         #map2_dbl(b_Intercept, sigma, ~ rnorm(1, mean = .x, sd = .y)),\n         Prediction = \"prediction\") %>% #glimpse %>% \n  ggplot(aes(x = prediction, y = Prediction)) +\n  geom_density_ridges(fill = \"#41b6c4\") +    \n  theme_minimal()\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nPicking joint bandwidth of 0.119\n```\n:::\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-13-1.png){width=672}\n:::\n:::\n\n\nAs you can see, this distribution has exactly the same shape as the five in the previous figure! It is as if we calculated the predictions for a group which was exactly at the average (in other words, it had a varying intercept of 0.) In the Rethinking book, readers are taught to do this in a much more explicit way: you actually generate all the 0 intercepts yourself, and give that to the model in place of the estimated intercepts! A very manual and concrete way to \"set something to 0\".  \n\n`brms` does this too. As the documentation says\n>NA values within factors in newdata, are interpreted as if all dummy variables of this factor are zero.\n\nThe `brms` phrasing certainly takes less space, though it also requires you to remember that this is what NA gets you!\n\nWe can also remove random effects from our predictions by excluding them from the `re_formula`. In our model, we have only one varying effect -- yet an even simpler formula is possible, a model with no intercept at all:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nABC %>%\n  data_grid(condition) %>%\n  add_predicted_draws(m, re_formula = ~ 0,\n                        n = 2000) %>%\n  ggplot(aes(x = .prediction, y = condition)) +\n  geom_density_ridges(fill = \"#41b6c4\") + theme_minimal() \n```\n\n::: {.cell-output .cell-output-stderr}\n```\nPicking joint bandwidth of 0.14\n```\n:::\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-14-1.png){width=672}\n:::\n:::\n\n\nOnce again, the same distribution appears: it is as if all group effects had been set to zero. If we had two random effects and omitted one, this is what we would get for the omitted effect -- the expected value if all its effects were 0.\n\n### New levels\n\nI'm going to show how to create predictions for new levels, but first I'm going to show two mistakes that I made frequently while learning:\n\nFirst, asking for new levels without specifying `allow_new_levels = TRUE`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# this does not work at all!!\ndata_frame(condition = \"bugaboo\") %>%\n  add_predicted_draws(m, re_formula = ~(1|condition),\n                        n = 2000)\n```\n\n::: {.cell-output .cell-output-error}\n```\nError: Levels 'bugaboo' of grouping factor 'condition' cannot be found in the fitted model. Consider setting argument 'allow_new_levels' to TRUE.\n```\n:::\n:::\n\n\nThat fails because I tried to pass in a level of my grouping variable that _wasn't_ in the original model! \n\nSecond, passing in new levels -- but telling the function to just ignore them:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_frame(condition = \"bugaboo\") %>%\n  add_predicted_draws(m, re_formula = NA,#~(1|condition),\n                        n = 2000) %>%\n  ggplot(aes(x = .prediction, y = condition)) +\n  geom_density_ridges(fill = \"#41b6c4\") + \n  theme_minimal()\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nPicking joint bandwidth of 0.142\n```\n:::\n\n::: {.cell-output-display}\n![](index_files/figure-html/m_data_NA-1.png){width=672}\n:::\n:::\n\n\nHere, I'm still passing in the unknown level -- but the function doesn't complain, because I'm not including random effects at all! This is the same result from above, when we used `NA` or `~0` to remove varying effects altogether. This is definitely something to watch for if you are passing in new data (I made this mistake, and it cost me an afternoon!)\n\nIf we avoid both of these errors, we get what we expect: our means for our original groups, and a new predicted mean for `\"bugaboo\"`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nABC %>%\n  data_grid(condition) %>% \n  add_row(condition = \"bugaboo\") %>%\n  add_predicted_draws(m, re_formula = ~(1|condition),\n                        allow_new_levels = TRUE,\n                        n = 2000) %>%\n  ggplot(aes(x = .prediction, y = condition)) +\n  geom_density_ridges(fill = \"#41b6c4\") +    theme_minimal()\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nPicking joint bandwidth of 0.131\n```\n:::\n\n::: {.cell-output-display}\n![](index_files/figure-html/new_level-1.png){width=672}\n:::\n:::\n\n\nHere you can see that the new level is much flatter than the other original five. It comes from the same population as the others, which is rather variable (the group means are sort of different to each other). As a result, this new distribution is quite wide, including all that uncertainty. \n\nAn ecologist might do something like this if we were had data on _some_ species in a community, but wanted to make predictions for new, as yet unobserved, species we might find next year.\n\n# Day 3: Offsets\n\nIn which we cover how to do a count analysis for different efforts or exposure.\n\n## Outline\n\n- Poisson count model\n- Bayesian Poisson count model\n\n## Today's data: Dandelions\n\nLet's imagine that we have counted dandelions. \n\nDandelions occur on average 6 per square meter\n\nHowever we have five kinds of quadrat: 1, 4, 9 and 25 square meters\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n\nimaginary_dandelions <- tibble(quadrat_size = rep(c(1,4, 9, 25), each = 15),\n       n_per_m2 = purrr::map(quadrat_size, rpois, lambda = 6),\n       obs_dandelions = map_dbl(n_per_m2, sum))\n\nggplot(imaginary_dandelions, aes(x = obs_dandelions)) + geom_histogram() + \n  facet_wrap(~quadrat_size)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.\n```\n:::\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-15-1.png){width=672}\n:::\n:::\n\n\nHow can we get the correct number of dandelions? \n\n## Poisson count model\n\n$$\n\\begin{align}\ny &\\sim \\text{Poisson}(\\lambda) \\\\\n\\text{log}(\\lambda) &= \\beta\n\\end{align}\n$$\n$\\lambda$ is the average response. If we want to measure the average _per unit effort_, we can do that too:\n\n$$\n\\begin{align}\ny &\\sim \\text{Poisson}(\\lambda) \\\\\n\\text{log}(\\lambda/Q) &= \\beta\n\\end{align}\n$$\n\n\n\n\n$$\n\\begin{align}\ny &\\sim \\text{Poisson}(\\lambda) \\\\\n\\text{log}(\\lambda) - \\text{log}(Q) &= \\beta\n\\end{align}\n$$\n\n\n$$\n\\begin{align}\ny &\\sim \\text{Poisson}(\\lambda) \\\\\n\\text{log}(\\lambda) &= \\beta + \\text{log}(Q)\n\\end{align}\n$$\n\nIn other words, we need a way to add a log coefficient to a model and give it a slope of _exactly one_. Fortunately the function `offset()` is here to do exactly this:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndandelion_model <- glm(obs_dandelions ~ 1, family = poisson(link = \"log\"), data = imaginary_dandelions)\nsummary(dandelion_model) \n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\nCall:\nglm(formula = obs_dandelions ~ 1, family = poisson(link = \"log\"), \n    data = imaginary_dandelions)\n\nDeviance Residuals: \n   Min      1Q  Median      3Q     Max  \n-9.802  -6.227  -2.658   2.539  11.652  \n\nCoefficients:\n            Estimate Std. Error z value Pr(>|z|)    \n(Intercept)  4.03836    0.01714   235.6   <2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n(Dispersion parameter for poisson family taken to be 1)\n\n    Null deviance: 2909.7  on 59  degrees of freedom\nResidual deviance: 2909.7  on 59  degrees of freedom\nAIC: 3230.4\n\nNumber of Fisher Scoring iterations: 5\n```\n:::\n:::\n\n\nThis gives the wrong answer! \n \n\n::: {.cell}\n\n```{.r .cell-code}\ndandelion_model <- glm(obs_dandelions ~ 1 + offset(log(quadrat_size)),\n                       family = poisson(link = \"log\"),\n                       data = imaginary_dandelions)\nsummary(dandelion_model) \n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\nCall:\nglm(formula = obs_dandelions ~ 1 + offset(log(quadrat_size)), \n    family = poisson(link = \"log\"), data = imaginary_dandelions)\n\nDeviance Residuals: \n     Min        1Q    Median        3Q       Max  \n-1.83462  -0.45999   0.07473   0.46032   2.07858  \n\nCoefficients:\n            Estimate Std. Error z value Pr(>|z|)    \n(Intercept)  1.76109    0.01714   102.7   <2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n(Dispersion parameter for poisson family taken to be 1)\n\n    Null deviance: 43.464  on 59  degrees of freedom\nResidual deviance: 43.464  on 59  degrees of freedom\nAIC: 364.13\n\nNumber of Fisher Scoring iterations: 4\n```\n:::\n:::\n\n\nThe coefficient should be close to 6, after we reverse the link function:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexp(coef(dandelion_model)[[1]])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 5.818803\n```\n:::\n:::\n\n\n## Do it the Bayes Way\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(brms)\n\ndandelion_bf <- bf(obs_dandelions ~ 1 + offset(log(quadrat_size)), \n                   family = poisson(link = \"log\"))\n\nget_prior(dandelion_bf, data = imaginary_dandelions)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nIntercept ~ student_t(3, 1.89940130916892, 2.5)\n```\n:::\n\n```{.r .cell-code}\ndandelion_prior <- prior(normal(2, 1), class = \"Intercept\")\n\ndandelion_model <- brm(formula = dandelion_bf,\n                       data = imaginary_dandelions, \n                       prior = dandelion_prior)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nCompiling Stan program...\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nStart sampling\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n\nSAMPLING FOR MODEL '712e52da308896603a6860da7d30e1d5' NOW (CHAIN 1).\nChain 1: \nChain 1: Gradient evaluation took 2.3e-05 seconds\nChain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.23 seconds.\nChain 1: Adjust your expectations accordingly!\nChain 1: \nChain 1: \nChain 1: Iteration:    1 / 2000 [  0%]  (Warmup)\nChain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)\nChain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)\nChain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)\nChain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)\nChain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)\nChain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)\nChain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)\nChain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)\nChain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)\nChain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)\nChain 1: Iteration: 2000 / 2000 [100%]  (Sampling)\nChain 1: \nChain 1:  Elapsed Time: 0.028059 seconds (Warm-up)\nChain 1:                0.026169 seconds (Sampling)\nChain 1:                0.054228 seconds (Total)\nChain 1: \n\nSAMPLING FOR MODEL '712e52da308896603a6860da7d30e1d5' NOW (CHAIN 2).\nChain 2: \nChain 2: Gradient evaluation took 9e-06 seconds\nChain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.\nChain 2: Adjust your expectations accordingly!\nChain 2: \nChain 2: \nChain 2: Iteration:    1 / 2000 [  0%]  (Warmup)\nChain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)\nChain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)\nChain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)\nChain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)\nChain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)\nChain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)\nChain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)\nChain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)\nChain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)\nChain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)\nChain 2: Iteration: 2000 / 2000 [100%]  (Sampling)\nChain 2: \nChain 2:  Elapsed Time: 0.02398 seconds (Warm-up)\nChain 2:                0.024673 seconds (Sampling)\nChain 2:                0.048653 seconds (Total)\nChain 2: \n\nSAMPLING FOR MODEL '712e52da308896603a6860da7d30e1d5' NOW (CHAIN 3).\nChain 3: \nChain 3: Gradient evaluation took 1.2e-05 seconds\nChain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.\nChain 3: Adjust your expectations accordingly!\nChain 3: \nChain 3: \nChain 3: Iteration:    1 / 2000 [  0%]  (Warmup)\nChain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)\nChain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)\nChain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)\nChain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)\nChain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)\nChain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)\nChain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)\nChain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)\nChain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)\nChain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)\nChain 3: Iteration: 2000 / 2000 [100%]  (Sampling)\nChain 3: \nChain 3:  Elapsed Time: 0.029274 seconds (Warm-up)\nChain 3:                0.02456 seconds (Sampling)\nChain 3:                0.053834 seconds (Total)\nChain 3: \n\nSAMPLING FOR MODEL '712e52da308896603a6860da7d30e1d5' NOW (CHAIN 4).\nChain 4: \nChain 4: Gradient evaluation took 9e-06 seconds\nChain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.\nChain 4: Adjust your expectations accordingly!\nChain 4: \nChain 4: \nChain 4: Iteration:    1 / 2000 [  0%]  (Warmup)\nChain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)\nChain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)\nChain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)\nChain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)\nChain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)\nChain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)\nChain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)\nChain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)\nChain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)\nChain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)\nChain 4: Iteration: 2000 / 2000 [100%]  (Sampling)\nChain 4: \nChain 4:  Elapsed Time: 0.023883 seconds (Warm-up)\nChain 4:                0.021166 seconds (Sampling)\nChain 4:                0.045049 seconds (Total)\nChain 4: \n```\n:::\n:::\n\n\nLook at the Stan code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstancode(dandelion_model)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n// generated with brms 2.18.0\nfunctions {\n}\ndata {\n  int<lower=1> N;  // total number of observations\n  int Y[N];  // response variable\n  vector[N] offsets;\n  int prior_only;  // should the likelihood be ignored?\n}\ntransformed data {\n}\nparameters {\n  real Intercept;  // temporary intercept for centered predictors\n}\ntransformed parameters {\n  real lprior = 0;  // prior contributions to the log posterior\n  lprior += normal_lpdf(Intercept | 2, 1);\n}\nmodel {\n  // likelihood including constants\n  if (!prior_only) {\n    // initialize linear predictor term\n    vector[N] mu = rep_vector(0.0, N);\n    mu += Intercept + offsets;\n    target += poisson_log_lpmf(Y | mu);\n  }\n  // priors including constants\n  target += lprior;\n}\ngenerated quantities {\n  // actual population-level intercept\n  real b_Intercept = Intercept;\n}\n```\n:::\n:::\n\n\nLook at posterior distribution of parameter:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# as.matrix(dandelion_model) |> head()\n\nlibrary(tidybayes)\n\ntidy_draws(dandelion_model) |> \n  ggplot(aes(x = exp(b_Intercept))) + \n  geom_histogram() + \n  geom_vline(xintercept = 6, col = \"red\", lwd = 3)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.\n```\n:::\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-21-1.png){width=672}\n:::\n:::\n",
    "supporting": [
      "index_files"
    ],
    "filters": [
      "rmarkdown/pagebreak.lua"
    ],
    "includes": {},
    "engineDependencies": {},
    "preserve": {},
    "postProcess": true
  }
}