You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Both based on the PPC and the comparisons with different model selection criteria, we can conclude that the second Poisson model with random intercepts fits the data best. In principle, we could have expected this based on our own intuition and the design of the study, i.e. the use of the Poisson distribution to model numbers and the use of random intercepts to control for a hierarchical design (habitats nested within sites).
1064
1064
1065
1065
1066
-
## Deep Dive: `rstan`
1067
-
### Stan: What? Why?!
1066
+
1067
+
# Final model results
1068
+
1069
+
When we look at the model fit object, we see results that are similar to results we see when we fit a frequentist model. On the one hand we get an estimate of all parameters with their uncertainty, but on the other hand we see that this is clearly the output of a Bayesian model. We get information about the parameters we used for the MCMC algorithm, we get a 95% credible interval (CI) instead of a confidence interval and we also get the $\hat{R}$ value for each parameter as discussed earlier.
1070
+
1071
+
```{r results-fit-poisson}
1072
+
# Look at the fit object of the Poisson model with random effects
1073
+
fit_poisson2
1074
+
```
1075
+
1076
+
A useful package for visualising the results of our final model is the [tidybayes](https://mjskay.github.io/tidybayes/articles/tidy-brms.html) package. Through this package, you can work with the posterior distributions as you would work with any dataset through the **tidyverse** package.
1077
+
1078
+
With the function `gather_draws()` you can take a certain number of samples from the posterior distributions of certain parameters and convert them into a long format table. You usually do not want to select all posterior samples because there are sometimes unnecessarily many. By specifying a 'seed' you ensure that these are the same samples every time you run the script again. You can then calculate certain summary statistics via the classic **dplyr** functions.
1079
+
1080
+
```{r results-fit-poisson-2}
1081
+
fit_poisson2 %>%
1082
+
# gather 1000 posterior samples for 2 parameters in long format
Useful functions of the **tidybayes** package are also `median_qi()`, `mean_qi()` ... after `gather_draws()` which you can use instead of `group_by()` and `summarise()` .
1097
+
1098
+
We would now like to visualise the estimated number of species per habitat type with associated uncertainty. With the function `spread_draws()` you can take a certain number of samples from the posterior distribution and convert them into a wide format table. The average number of species in bogs according to our model is $\exp(\beta_0)$ and in forests $\exp(\beta_0+\beta_1)$. We show the posterior distributions with the posterior median and 60 and 90% credible intervals.
1099
+
1100
+
```{r resultats-fit-poisson-3}
1101
+
fit_poisson2 %>%
1102
+
# spread 1000 posterior samples for 2 parameters in wide format
In addition to `stat_eye()` you will find [here](https://mjskay.github.io/tidybayes/articles/tidy-brms.html#other-visualizations-of-distributions-stat_slabinterval) some nice ways to visualise posterior distributions .
1116
+
1117
+
We see a clear difference in the number of species between the two habitats. Is there a significant difference between the number of species in bogs and forests? We test the hypothesis that numbers are equal in bogs and forests.
1118
+
1119
+
$$
1120
+
\exp(\beta_0) = \exp(\beta_0+\beta_1)\\
1121
+
\Rightarrow \beta_0 = \beta_0 + \beta_1\\
1122
+
\Rightarrow \beta_1 = 0\\
1123
+
$$
1124
+
1125
+
This can easily be done via the `hypothesis()` function of the **brms** package.
1126
+
The argument `alpha` specifies the size of the credible interval.
1127
+
This allows hypothesis testing in a similar way to the frequentist null hypothesis testing framework.
Let's go back to our very first model where we used the Normal distribution. This was equivalent to a linear regression with categorical variable. A linear regression with categorical variable is also called ANOVA and if there are only two groups, an ANOVA is equivalent to a t-test. We can therefore take the opportunity to compare the results of our first model (a Bayesian model) with the results of a classical (frequentist) t-test.
1169
+
1170
+
```{r compare-frequentist}
1171
+
# Extract summary statistics from the Bayesian model
We see that this indeed produces almost exactly the same results. Our Bayesian model estimates that on average `r round(diff_bog1, 3)` more ant species occur in forests than in bogs (90% credible interval: `r round(ll_diff_bog1, 3)` to `r round(ul_diff_bog1, 3)`). The t-test estimates that on average `r round(diff_bog2, 3)` more ant species occur in forests than in bogs (90% confidence interval: `r round(ll_diff_bog2, 3)` to `r round(ul_diff_bog2, 3)`).
1191
+
1192
+
1193
+
# Deep Dive: `rstan`
1194
+
## Stan: What? Why?!
1068
1195
The `brms` package is a convenience wrapper for the `rstan` package, which in turn ports `stan` functionality to R.
1069
1196
Stan is a modeling framework written in the `C` programming language, which implements many probabilistic ("Bayesian") modeling tools.
1070
1197
More info can be found on [the Stan website](https://mc-stan.org).
@@ -1074,7 +1201,7 @@ The advantage of `brms` is usability: many functions work out-of-the-box, with r
1074
1201
However, the relative ease-of-use comes at the cost of flexibility, and do some degree, readability.
1075
1202
1076
1203
In contrast, Stan and `rstan` lean more to the mathematical formulation of models.
1077
-
Every aspect of the model has to be explicitly set, which can be an advantage (e.g. if you face non-standard use cases), or disadvantage (e.g. if you secify models in non-optional ways).
1204
+
Every aspect of the model has to be explicitly set, which can be an advantage (e.g. if you face non-standard use cases), or disadvantage (e.g. if you specify models in non-optimal ways).
1078
1205
1079
1206
1080
1207
To briefly give an impression, we will build the same models as above, using the Stan framework.
RMarkdown can handle `stan` code chunks, though more general model definition is outsourced to a separate "*.stan" file.
1091
1218
Alternatively, you can define your model in a big text block, as shown below.
1092
1219
The simple poisson model resembles [one of the `stan`-dard examples](https://mc-stan.org/docs/stan-users-guide/posterior-prediction.html#posterior-prediction-for-regressions), which you can refer to for all further details and more.
Sampling does pretty much the same as above, since at the core, `brms` is just `stan`.
1141
1268
@@ -1172,7 +1299,7 @@ In other cases, it might pay off.
1172
1299
Know that Stan is there for you, do not hesitate to turn to its extensive documentation, and do not fear to give it a try!
1173
1300
1174
1301
1175
-
###Homework: Hierarchical Model
1302
+
## Homework: Hierarchical Model
1176
1303
To take your modeling skills even further, you may implement and sample the "random intercept" model.
1177
1304
In "Bayesian" terms, the [general terminology is "hierarchical" model](https://mc-stan.org/docs/stan-users-guide/regression.html#hierarchical-regression).
1178
1305
@@ -1237,131 +1364,6 @@ stan_poisson_fit
1237
1364
With Stan, po(i)ssibilities are almost endless - don't get lost in model building!
1238
1365
1239
1366
1240
-
# Final model results
1241
-
1242
-
When we look at the model fit object, we see results that are similar to results we see when we fit a frequentist model. On the one hand we get an estimate of all parameters with their uncertainty, but on the other hand we see that this is clearly the output of a Bayesian model. We get information about the parameters we used for the MCMC algorithm, we get a 95% credible interval (CI) instead of a confidence interval and we also get the $\hat{R}$ value for each parameter as discussed earlier.
1243
-
1244
-
```{r results-fit-poisson}
1245
-
# Look at the fit object of the Poisson model with random effects
1246
-
fit_poisson2
1247
-
```
1248
-
1249
-
A useful package for visualising the results of our final model is the [tidybayes](https://mjskay.github.io/tidybayes/articles/tidy-brms.html) package. Through this package, you can work with the posterior distributions as you would work with any dataset through the **tidyverse** package.
1250
-
1251
-
With the function `gather_draws()` you can take a certain number of samples from the posterior distributions of certain parameters and convert them into a long format table. You usually do not want to select all posterior samples because there are sometimes unnecessarily many. By specifying a 'seed' you ensure that these are the same samples every time you run the script again. You can then calculate certain summary statistics via the classic **dplyr** functions.
1252
-
1253
-
```{r results-fit-poisson-2}
1254
-
fit_poisson2 %>%
1255
-
# gather 1000 posterior samples for 2 parameters in long format
Useful functions of the **tidybayes** package are also `median_qi()`, `mean_qi()` ... after `gather_draws()` which you can use instead of `group_by()` and `summarise()` .
1270
-
1271
-
We would now like to visualise the estimated number of species per habitat type with associated uncertainty. With the function `spread_draws()` you can take a certain number of samples from the posterior distribution and convert them into a wide format table. The average number of species in bogs according to our model is $\exp(\beta_0)$ and in forests $\exp(\beta_0+\beta_1)$. We show the posterior distributions with the posterior median and 60 and 90% credible intervals.
1272
-
1273
-
```{r resultats-fit-poisson-3}
1274
-
fit_poisson2 %>%
1275
-
# spread 1000 posterior samples for 2 parameters in wide format
In addition to `stat_eye()` you will find [here](https://mjskay.github.io/tidybayes/articles/tidy-brms.html#other-visualizations-of-distributions-stat_slabinterval) some nice ways to visualise posterior distributions .
1289
-
1290
-
We see a clear difference in the number of species between the two habitats. Is there a significant difference between the number of species in bogs and forests? We test the hypothesis that numbers are equal in bogs and forests.
1291
-
1292
-
$$
1293
-
\exp(\beta_0) = \exp(\beta_0+\beta_1)\\
1294
-
\Rightarrow \beta_0 = \beta_0 + \beta_1\\
1295
-
\Rightarrow \beta_1 = 0\\
1296
-
$$
1297
-
1298
-
This can easily be done via the `hypothesis()` function of the **brms** package.
1299
-
The argument `alpha` specifies the size of the credible interval.
1300
-
This allows hypothesis testing in a similar way to the frequentist null hypothesis testing framework.
Let's go back to our very first model where we used the Normal distribution. This was equivalent to a linear regression with categorical variable. A linear regression with categorical variable is also called ANOVA and if there are only two groups, an ANOVA is equivalent to a t-test. We can therefore take the opportunity to compare the results of our first model (a Bayesian model) with the results of a classical (frequentist) t-test.
1342
-
1343
-
```{r compare-frequentist}
1344
-
# Extract summary statistics from the Bayesian model
We see that this indeed produces almost exactly the same results. Our Bayesian model estimates that on average `r round(diff_bog1, 3)` more ant species occur in forests than in bogs (90% credible interval: `r round(ll_diff_bog1, 3)` to `r round(ul_diff_bog1, 3)`). The t-test estimates that on average `r round(diff_bog2, 3)` more ant species occur in forests than in bogs (90% confidence interval: `r round(ll_diff_bog2, 3)` to `r round(ul_diff_bog2, 3)`).
0 commit comments