Skip to content

Commit 7b170ac

Browse files
committed
brms_rstan: poisson model, plain+hierarchical
1 parent d09e99e commit 7b170ac

3 files changed

Lines changed: 365 additions & 3 deletions

File tree

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
2+
data {
3+
int<lower=1> N; // sample size
4+
vector[N] habitat_obs; // habitat
5+
array[N] int<lower=0> sp_rich; // outcome variable: number of ants
6+
}
7+
8+
parameters {
9+
real intercept_rich; // intercept
10+
real habitat_effect; // slope
11+
}
12+
13+
model {
14+
// priors
15+
intercept_rich ~ normal(0, 1);
16+
habitat_effect ~ normal(0, 1);
17+
18+
// "posterior", "likelihood", ... give it a name!
19+
sp_rich ~ poisson_log(intercept_rich + habitat_effect * habitat_obs);
20+
21+
}

content/tutorials/r_brms/brms_eng/workshop_1_mcmc_en_brms_eng.Rmd

Lines changed: 174 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -632,7 +632,7 @@ The model is fitted using the `brm()` function. The syntax is very similar to fu
632632
- `file` and `file_refit` to save the model object after it has been fitted. If you run the code again and the model has already been saved, `brm()` will simply load this model instead of refitting it.
633633

634634

635-
```{r simple-model-fit-poisson, class.source = 'fold-show'}
635+
```{r simple-model-fit-poisson, class.source='fold-show'}
636636
# Fit Normal model
637637
fit_normal1 <- brm(
638638
formula = sp_rich ~ habitat, # specify the model
@@ -838,7 +838,7 @@ $$
838838

839839
So we need to estimate two parameters: $\beta_0$ and $\beta_1$ We use the same MCMC parameters as before. The only thing we need to adjust is the choice `family = poisson()`.
840840

841-
```{r poisson-model-fit, class.source = 'fold-show'}
841+
```{r poisson-model-fit, class.source='fold-show'}
842842
# Fit Poisson model
843843
fit_poisson1 <- brm(
844844
formula = sp_rich ~ habitat, # specify the model
@@ -900,7 +900,7 @@ $$
900900
b_0 \sim N(0, \sigma_b)
901901
$$
902902

903-
```{r rand-intercept-model-fit, class.source = 'fold-show'}
903+
```{r rand-intercept-model-fit, class.source='fold-show'}
904904
# Fit Poisson model with random intercept per site
905905
fit_poisson2 <- brm(
906906
formula = sp_rich ~ habitat + (1|site),
@@ -1062,6 +1062,177 @@ comp_waic %>%
10621062
Both based on the PPC and the comparisons with different model selection criteria, we can conclude that the second Poisson model with random intercepts fits the data best. In principle, we could have expected this based on our own intuition and the design of the study, i.e. the use of the Poisson distribution to model numbers and the use of random intercepts to control for a hierarchical design (habitats nested within sites).
10631063

10641064

1065+
## Deep Dive: `rstan`
1066+
### Stan: What? Why?!
1067+
The `brms` package is a convenience wrapper for the `rstan` package, which in turn ports `stan` functionality to R.
1068+
Stan is a modeling framework written in the `C` programming language, which implements many probabilistic ("Bayesian") modeling tools.
1069+
More info can be found on [the Stan website](https://mc-stan.org).
1070+
1071+
1072+
The advantage of `brms` is usability: many functions work out-of-the-box, with reasonable default values, and a syntax that is similar to what frequentists are habituated to.
1073+
However, the relative ease-of-use comes at the cost of flexibility, and do some degree, readability.
1074+
1075+
In contrast, Stan and `rstan` lean more to the mathematical formulation of models.
1076+
Every aspect of the model has to be explicitly set, which can be an advantage (e.g. if you face non-standard use cases), or disadvantage (e.g. if you secify models in non-optional ways).
1077+
1078+
1079+
To briefly give an impression, we will build the same models as above, using the Stan framework.
1080+
1081+
1082+
```{r}
1083+
library("rstan")
1084+
conflicted::conflicts_prefer(rstan::extract)
1085+
conflicted::conflicts_prefer(brms::loo)
1086+
```
1087+
1088+
### Model Definition
1089+
RMarkdown can handle `stan` code chunks, though more general model definition is outsourced to a separate "*.stan" file.
1090+
The simple poisson model resembles [one of the `stan`-dard examples](https://mc-stan.org/docs/stan-users-guide/posterior-prediction.html#posterior-prediction-for-regressions), which you can refer to for all further details and more.
1091+
1092+
1093+
```{stan, output.var="stan_poisson_model", class.source='fold-show'}
1094+
data {
1095+
int<lower=1> N; // sample size
1096+
vector[N] habitat_obs; // habitat
1097+
array[N] int<lower=0> sp_rich; // outcome variable: number of ants
1098+
}
1099+
1100+
parameters {
1101+
real intercept_rich; // intercept
1102+
real habitat_effect; // slope
1103+
}
1104+
1105+
model {
1106+
// priors
1107+
intercept_rich ~ normal(0, 1);
1108+
habitat_effect ~ normal(0, 1);
1109+
1110+
// "posterior", "likelihood", ... give it a name!
1111+
sp_rich ~ poisson_log(intercept_rich + habitat_effect * habitat_obs);
1112+
1113+
}
1114+
1115+
1116+
```
1117+
1118+
1119+
Take this as a "look behind the scenes"!
1120+
You have to explicitly define the model structure, priors, even the data types of input variables.
1121+
Yet note how even Stan is not without convenience functions: we can use the `poisson_log` posterior to model log rates.
1122+
1123+
1124+
When working outside RStudio/RMarkdown, you might prefer loading the model from a file:
1125+
1126+
```{r stan_load_model, eval=FALSE, class.source='fold-show'}
1127+
stan_poisson_model <- stan_model(
1128+
file = "./poisson_model.stan",
1129+
model_name = "stan poisson model"
1130+
)
1131+
```
1132+
1133+
1134+
### Sampling
1135+
1136+
Sampling does pretty much the same as above, since at the core, `brms` is just `stan`.
1137+
1138+
```{r stan_sampling}
1139+
stan_poisson_fit <- sampling(
1140+
stan_poisson_model,
1141+
list(
1142+
N = nrow(ants_df),
1143+
habitat_obs = as.integer(ants_df$habitat)-1,
1144+
sp_rich = ants_df$sp_rich
1145+
),
1146+
iter = niter,
1147+
chains = nchains,
1148+
cores = nparallel
1149+
)
1150+
1151+
stan_poisson_fit
1152+
```
1153+
1154+
1155+
A convenient way of quickly inspecting model fits is `shinystan`.
1156+
It opens a browser with shiny plots.
1157+
1158+
```{r eval=FALSE}
1159+
library("shinystan")
1160+
launch_shinystan(stan_poisson_fit)
1161+
```
1162+
1163+
1164+
Behold: model outcome is exactly as above.
1165+
In the present case, the benefit from turning to `stan` is very limited.
1166+
In other cases, it might pay off.
1167+
1168+
Know that Stan is there for you, do not hesitate to turn to its extensive documentation, and do not fear to give it a try!
1169+
1170+
1171+
### Homework: Hierarchical Model
1172+
To take your modeling skills even further, you may implement and sample the "random intercept" model.
1173+
In "Bayesian" terms, the [general terminology is "hierarchical" model](https://mc-stan.org/docs/stan-users-guide/regression.html#hierarchical-regression).
1174+
1175+
1176+
Below is the code which considers site-specific intercepts.
1177+
It works slightly different from the `+(1|site)` approach above: the upper one is an "offset" parametrization, whereas this time we use a hyperprior.
1178+
These are two common strategies often worth interchanging, with marginal or substantial effects on sampler performance.
1179+
1180+
1181+
```{stan, eval=FALSE, output.var="stan_hierarchical_poisson"}
1182+
data {
1183+
int<lower=1> N; // sample size
1184+
int<lower=1> L; // number of sites
1185+
vector[N] habitat_obs; // habitat
1186+
array[N] int<lower=1, upper=L> site_obs; // site
1187+
array[N] int<lower=0> sp_rich; // outcome variable: number of ants
1188+
}
1189+
1190+
parameters {
1191+
real intercept_rich; // "global" intercept
1192+
vector[L] site_effect; // the sr intercept at each site
1193+
real<lower=0> site_variation; // variation on site level
1194+
real habitat_effect; // habitat slope
1195+
}
1196+
1197+
model {
1198+
// priors
1199+
intercept_rich ~ normal(0, 1);
1200+
site_variation ~ cauchy(0, 1);
1201+
habitat_effect ~ normal(0, 1);
1202+
1203+
// site-wise intercept
1204+
for (i in 1:L) {
1205+
site_effect[i] ~ normal(intercept_rich, site_variation);
1206+
}
1207+
1208+
// "posterior", "likelihood", ... give it a name!
1209+
sp_rich ~ poisson_log(site_effect[site_obs] + habitat_effect * habitat_obs);
1210+
1211+
}
1212+
```
1213+
1214+
1215+
```{r, eval=FALSE, stan_hierarchical_sampling}
1216+
stan_poisson_fit <- sampling(
1217+
stan_poisson_model,
1218+
list(
1219+
N = nrow(ants_df),
1220+
L = length(levels(ants_df$site)),
1221+
habitat_obs = as.integer(ants_df$habitat)-1,
1222+
site_obs = as.integer(ants_df$site),
1223+
sp_rich = ants_df$sp_rich
1224+
),
1225+
iter = niter,
1226+
chains = nchains,
1227+
cores = nparallel
1228+
)
1229+
1230+
stan_poisson_fit
1231+
```
1232+
1233+
With Stan, po(i)ssibilities are almost endless - don't get lost in model building!
1234+
1235+
10651236
# Final model results
10661237

10671238
When we look at the model fit object, we see results that are similar to results we see when we fit a frequentist model. On the one hand we get an estimate of all parameters with their uncertainty, but on the other hand we see that this is clearly the output of a Bayesian model. We get information about the parameters we used for the MCMC algorithm, we get a 95% credible interval (CI) instead of a confidence interval and we also get the $\hat{R}$ value for each parameter as discussed earlier.

0 commit comments

Comments
 (0)