-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathv61_pedigree_model_fitting.Rmd
More file actions
376 lines (278 loc) · 17.5 KB
/
v61_pedigree_model_fitting.Rmd
File metadata and controls
376 lines (278 loc) · 17.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
---
title: "Extended: Fitting Pedigree-Based Variance Component Models"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Extended: Fitting Pedigree-Based Variance Component Models}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE, echo=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)
# source('https://vipbg.vcu.edu/vipbg/OpenMx2/software/getOpenMx.R')
library(BGmisc)
library(ggpedigree) # for pedigree data)
library(tidyverse)
has_openmx <- requireNamespace("OpenMx", quietly = TRUE)
if (has_openmx) {
library(OpenMx)
mxOption(key = "Number of Threads", value = imxGetNumThreads())
} else {
message(
"OpenMx is not installed. Code will be shown but not executed.\n",
"Install with: install.packages('OpenMx')"
)
}
bgmisc_testing_env <- Sys.getenv("BGMISC_TESTING", unset = "")
bgmisc_testing <- tolower(bgmisc_testing_env) %in% c("1", "true", "yes")
run_models <- has_openmx && interactive() # && !bgmisc_testing
rds_dir <- file.path("extdata/")
rds_file_ace <- file.path(rds_dir, "fitted_multi_ace.rds")
rds_file_mace <- file.path(rds_dir, "fitted_multi_mace.rds")
rds_file_ce <- file.path(rds_dir, "fitted_multi_ce.rds")
```
```{r, include = FALSE}
run_models
has_openmx
interactive()
bgmisc_testing
bgmisc_testing_env
```
# Introduction
This vignette extends the example from `vignette("v60_pedigree_model_fitting", package = "BGmisc")` to show how to fit models to multiple families simultaneously. The key functions are `buildOneFamilyGroup()` and `buildPedigreeMx()`, which translate pedigree data into OpenMx model specifications.
## Scaling Up to Many Families
Here we replicate several estimates of heritability across multiple families of red squirrels. We use the `redsquirrels_full` dataset from the `ggpedigree` package, which contains pedigree and phenotypic data on red squirrels from the Kluane region of the Yukon, Canada. The phenotype we analyze is lifetime reproductive success (LRS), which is a count of the number of offspring that survive to a certain age.
```{r krsp-prep}
library(ggpedigree) # for pedigree data)
library(tidyverse)
data("redsquirrels_full")
ped_krsp <- redsquirrels_full |>
transmute(
ID = as.integer(personID),
momID = as.integer(momID),
dadID = as.integer(dadID),
sex = sex,
famID = as.integer(famID),
lrs = lrs
)
summarizeFamilies(ped_krsp, famID = "famID")$family_summary |> arrange(desc(count))
```
As you can see, there are `r nrow(ped_krsp)` squirells in `r n_distinct(ped_krsp$famID)` families in this dataset. However, but we are missing several phenotyped individuals (i.e., individuals with non-missing LRS). To fit a multigroup pedigree model, we need to subset to families that have a sufficient number of phenotyped individuals. Here we set a minimum family size of 10 phenotyped individuals, which leaves us with a reduced number of families to include in the analysis.
```{r}
minim_family_size <- 10
ped_krsp_subset <- ped_krsp |>
group_by(famID) |>
filter(sum(!is.na(lrs)) >= minim_family_size) |>
ungroup()
id_families <- unique(ped_krsp_subset$famID)
n_families <- length(id_families)
```
We now have `r n_families` families with at least `r minim_family_size` phenotyped individuals each. Once we subset to the families with sufficient phenotyped individuals, we can prepare the relatedness matrices and phenotypic data for each family. We pre-allocate lists to store these matrices and data for each family, which will be used to build the OpenMx models.
```{r}
# Pre-allocate storage
add_list <- vector("list", length(n_families))
cn_list <- vector("list", length(n_families))
mt_list <- vector("list", length(n_families))
obs_ids_list <- vector("list", length(n_families))
pheno_list <- vector("list", length(n_families))
```
Creating starting values for the variance components is important for model convergence. Here we set some reasonable starting values based on prior knowledge or expectations about the trait and the population. These starting values can be adjusted based on the specific dataset and trait being analyzed.
```{r}
# Starting values for variance components
start_vars <- list(
ad2 = 0.1, # additive genetic
cn2 = 0.1, # common nuclear environment
ce2 = 0, # common extended (not estimated here)
mt2 = 0.1, # mitochondrial
dd2 = 0, # dominance (not estimated here)
am2 = 0, # A x Mt interaction (not estimated here)
ee2 = 0.7 # unique environment
)
```
### Model Building
Now we loop through each family, extract the pedigree and phenotypic data, and prepare the relatedness matrices (additive genetic, common nuclear environment, and mitochondrial) for each family. We also prepare the phenotypic data in the format required for OpenMx. Finally, we build the group models for ACE, MACE, and CE models for each family. The `buildOneFamilyGroup()` function is used to create the model specification for each family, and the `buildPedigreeMx()` function is used to combine these group models into a single multigroup model that can be fitted in OpenMx. As you can see, we are fitting three different models: ACE (additive genetic, common environment, unique environment), MACE (additive genetic, common environment, mitochondrial, unique environment), and CE (common environment, unique environment). This allows us to compare the models and assess the contribution of each variance component to the trait of interest (LRS) in these squirrels.
```{r, eval = has_openmx}
for (i in seq_len(n_families)) {
ped_i <- subset(ped_krsp_subset, famID == id_families[i])
phenotypic_ids <- ped_i$ID[!is.na(ped_i$lrs)]
A_i <- as.matrix(ped2add(ped_i, sparse = FALSE, keep_ids = phenotypic_ids))
Cn_i <- as.matrix(ped2cn(ped_i, sparse = FALSE, keep_ids = phenotypic_ids))
Mt_i <- as.matrix(ped2mit(ped_i, sparse = FALSE, keep_ids = phenotypic_ids))
n_i <- nrow(A_i)
id_order_i <- rownames(A_i)
pheno_vals <- ped_i$lrs[match(id_order_i, as.character(ped_i$ID))]
obs_ids_i <- make.names(id_order_i[!is.na(pheno_vals)])
pheno_row_i <- matrix(as.double(pheno_vals[!is.na(pheno_vals)]),
nrow = 1,
dimnames = list(NULL, obs_ids_i)
)
rownames(A_i) <- colnames(A_i) <- obs_ids_i
rownames(Cn_i) <- colnames(Cn_i) <- obs_ids_i
rownames(Mt_i) <- colnames(Mt_i) <- obs_ids_i
add_list[[i]] <- A_i
cn_list[[i]] <- Cn_i
mt_list[[i]] <- Mt_i
obs_ids_list[[i]] <- obs_ids_i
pheno_list[[i]] <- pheno_row_i
}
```
For convenience, we build the group models for each family separately for the ACE, MACE, and CE models. This allows us to easily compare the models and assess the contribution of each variance component to the trait of interest (LRS) in these squirrels. The `buildOneFamilyGroup()` function is used to create the model specification for each family, and the `buildPedigreeMx()` function is used to combine these group models into a single multigroup model that can be fitted in OpenMx.
```{r, eval = has_openmx}
group_models_mace <- lapply(seq_len(n_families), function(i) {
buildOneFamilyGroup(
group_name = paste0("ped", i),
Addmat = add_list[[i]],
Nucmat = cn_list[[i]],
Mtdmat = mt_list[[i]],
full_df_row = pheno_list[[i]],
obs_ids = obs_ids_list[[i]]
)
})
group_models_ace <- lapply(seq_len(n_families), function(i) {
buildOneFamilyGroup(
group_name = paste0("ped", i),
Addmat = add_list[[i]],
Nucmat = cn_list[[i]],
Mtdmat = NULL,
full_df_row = pheno_list[[i]],
obs_ids = obs_ids_list[[i]]
)
})
group_models_ce <- lapply(seq_len(n_families), function(i) {
buildOneFamilyGroup(
group_name = paste0("ped", i),
Addmat = NULL,
Nucmat = cn_list[[i]],
Mtdmat = NULL,
full_df_row = pheno_list[[i]],
obs_ids = obs_ids_list[[i]]
)
})
```
The `lapply()` function is used to create a list of group models for each family, which are then combined into a single multigroup model using the buildPedigreeMx() function. The resulting models can then be fitted in OpenMx to estimate the variance components for each family and compare the ACE, MACE, and CE models.
```{r, eval = has_openmx}
multi_model_mace <- buildPedigreeMx(
model_name = "MultiPedigreeModel",
vars = start_vars,
group_models = group_models_mace,
ci = TRUE
)
multi_model_ace <- buildPedigreeMx(
model_name = "MultiPedigreeModel",
vars = start_vars,
group_models = group_models_ace,
ci = TRUE
)
multi_model_ce <- buildPedigreeMx(
model_name = "MultiPedigreeModel",
vars = start_vars,
group_models = group_models_ce,
ci = TRUE
)
```
## Model Fitting
```{r, include = FALSE}
fitted_multi_ace <- NULL # ensure variable exists for later use
fitted_multi_mace <- NULL # ensure variable exists for later use
fitted_multi_ce <- NULL # ensure variable exists for later use
```
```{r, eval = run_models && has_openmx, include = FALSE}
# these models take a while to fit, so we save the fitted models as RDS files for later use. If the RDS files already exist, we can skip fitting the models and just load the fitted models from the RDS files.
fitted_multi_ace <- mxRun(multi_model_ace,
intervals = TRUE
)
saveRDS(fitted_multi_ace, "inst/extdata/fitted_multi_ace.rds")
fitted_multi_mace <- mxRun(multi_model_mace, intervals = TRUE)
saveRDS(fitted_multi_mace, "inst/extdata/fitted_multi_mace.rds")
fitted_multi_ce <- mxRun(multi_model_ce, intervals = TRUE)
saveRDS(fitted_multi_ce, "inst/extdata/fitted_multi_ce.rds")
```
```{r, eval = FALSE && has_openmx, include = FALSE}
# If the models do not converge well, we can use mxTryHard() to attempt to find better-fitting solutions. This can be especially useful when fitting complex models with many families and variance components, as convergence can sometimes be challenging. The mxTryHard() function will try different optimization settings and starting values to improve convergence. We also save the results of mxTryHard() as RDS files for later use.
tt_ace <- mxTryHard(fitted_multi_ace, intervals = TRUE)
saveRDS(tt_ace, "inst/extdata/fitted_multi_ace_tryhard.rds")
tt_mace <- mxTryHard(fitted_multi_mace, intervals = TRUE)
saveRDS(tt_mace, "inst/extdata/fitted_multi_mace_tryhard.rds")
tt_ce <- mxTryHard(fitted_multi_ce, intervals = TRUE)
saveRDS(tt_ce, "inst/extdata/fitted_multi_ce_tryhard.rds")
```
Note that fitting these models can take some time, especially with many families and large pedigrees. The `mxTryHard()` function can be used to attempt to find better-fitting solutions if the initial optimization does not converge well. In practice, you may want to experiment with different starting values or optimization settings to improve convergence.
```
fitted_multi_mace <- mxRun(multi_model_mace)
fitted_multi_ace <- mxRun(multi_model_ace)
fitted_multi_ce <- mxRun(multi_model_ce)
```
```{r, include = FALSE, eval = !run_models && has_openmx}
fitted_multi_ace <- readRDS(rds_file_ace)
fitted_multi_mace <- readRDS(rds_file_mace)
fitted_multi_ce <- readRDS(rds_file_ce)
```
### MACE Model Results
As you can see, we have fit a multigroup pedigree model using `r n_families` families of several thousand squirrels. The model includes additive genetic variance (Vad), common nuclear environmental variance (Vcn), and unique environmental variance (Ver). The mitochondrial variance component (Vmt) was included in the MACE model but not estimated in the ACE model. The results show the proportion of total variance attributed to each component, which can be interpreted as heritability and environmental contributions to the phenotype of interest (LRS) in these
squirrels.
```{r, eval = has_openmx && !is.null(fitted_multi_mace), echo = TRUE}
summary(fitted_multi_mace)
summary(fitted_multi_mace)$CI
total_var_mace <- sum(
fitted_multi_mace$ModelOne$Vad$values,
fitted_multi_mace$ModelOne$Vcn$values,
fitted_multi_mace$ModelOne$Vmt$values,
fitted_multi_mace$ModelOne$Ver$values
)
```
```{r total-fam-results-mace, eval = has_openmx && !is.null(fitted_multi_ace)}
cat("Additive genetic (Vad):", fitted_multi_mace$ModelOne$Vad$values / total_var_mace, "\n")
cat("Common nuclear (Vcn):", fitted_multi_mace$ModelOne$Vcn$values / total_var_mace, "\n")
cat("Mitochondrial (Vmt):", fitted_multi_mace$ModelOne$Vmt$values / total_var_mace, "\n")
cat("Unique environ. (Ver):", fitted_multi_mace$ModelOne$Ver$values / total_var_mace, "\n")
```
As you can see, the MACE model includes an additional variance component for mitochondrial effects (Vmt), which allows us to assess the contribution of mitochondrial inheritance to the trait of interest (LRS) in these squirrels. The results show the proportion of total variance attributed to each component, which can be interpreted as heritability and environmental contributions to LRS in these squirrels. We can also compare the ACE and MACE models to see if including the mitochondrial component changes our estimates of heritability and environmental contributions.
### ACE Model Results
Now we can look at the results from the ACE model, which does not include the mitochondrial component. This allows us to compare the estimates of additive genetic and common nuclear environmental variance between the ACE and MACE models, and assess whether including the mitochondrial component changes our estimates of heritability and environmental contributions to LRS in these squirrels.
```{r, eval = has_openmx && !is.null(fitted_multi_ace)}
summary(fitted_multi_ace)
summary(fitted_multi_ace, verbose = T)$CI
#summary(fitted_multi_ace)$CI
total_var_ace <- sum(
fitted_multi_ace$ModelOne$Vad$values,
fitted_multi_ace$ModelOne$Vcn$values,
# fitted_multi_ace$ModelOne$Vmt$values,
fitted_multi_ace$ModelOne$Ver$values
)
```
As you can see, the ACE model includes variance components for additive genetic effects (Vad), common nuclear environmental effects (Vcn), and unique environmental effects (Ver), but does not include a component for mitochondrial effects (Vmt). The results show the proportion of total variance attributed to each component, which can be interpreted as heritability and environmental contributions to LRS in these squirrels. We can compare these estimates to those from the MACE model to assess the contribution of mitochondrial inheritance to LRS in these squirrels.
```{r total-fam-results, eval = has_openmx && !is.null(fitted_multi_ace)}
cat("Additive genetic (Vad):", fitted_multi_ace$ModelOne$Vad$values / total_var_ace, "\n")
cat("Common nuclear (Vcn):", fitted_multi_ace$ModelOne$Vcn$values / total_var_ace, "\n")
cat("Unique environ. (Ver):", fitted_multi_ace$ModelOne$Ver$values / total_var_ace, "\n")
```
### CE Model Results
```{r, eval = has_openmx && !is.null(fitted_multi_ce)}
summary(fitted_multi_ce)
summary(fitted_multi_ce, verbose = T)$CI
total_var_ce <- sum(
# fitted_multi_ce$ModelOne$Vad$values,
fitted_multi_ce$ModelOne$Vcn$values,
# fitted_multi_ce$ModelOne$Vmt$values,
fitted_multi_ce$ModelOne$Ver$values
)
```
As you can see, the CE model includes variance components for common nuclear environmental effects (Vcn) and unique environmental effects (Ver), but does not include a component for additive genetic effects (Vad) or mitochondrial effects (Vmt). The results show the proportion of total variance attributed to each component, which can be interpreted as environmental contributions to LRS in these squirrels. We can compare these estimates to those from the ACE and MACE models to assess the contribution of additive genetic and mitochondrial inheritance to LRS in these squirrels.
```{r total-fam-results-ce, eval = has_openmx && !is.null(fitted_multi_ce)}
cat("Common nuclear (Vcn):", fitted_multi_ce$ModelOne$Vcn$values / total_var_ce, "\n")
cat("Unique environ. (Ver):", fitted_multi_ce$ModelOne$Ver$values/ total_var_ce, "\n")
```
### Model Comparison
All of these models are nested, with the CE model being the most constrained (only common environment and unique environment), the ACE model including additive genetic effects, and the MACE model including both additive genetic effects and mitochondrial effects.
Now we can compare the ACE and MACE models to see if including the mitochondrial component changes our estimates of heritability and environmental contributions. This can provide insights into the role of mitochondrial inheritance in the trait of interest (LRS) in these squirrels. In the original paper, the authors found that a maternally inherited component (which could be due to mitochondria or maternal effects) explained a significant portion of the variance in LRS,.
```{r, eval = has_openmx && !is.null(fitted_multi_ace) && !is.null(fitted_multi_mace)}
mxCompare(fitted_multi_mace, fitted_multi_ace)
```
However, as you can see when we compare the ACE and MACE models, the inclusion of the mitochondrial component does not substantially change the estimates of additive genetic and common nuclear environmental variance, suggesting that the mitochondrial component may not be a major contributor to LRS in this dataset.
```{r, eval = has_openmx && !is.null(fitted_multi_ace) && !is.null(fitted_multi_ce)}
mxCompare(fitted_multi_ace, fitted_multi_ce)
```
When we compare the ACE and CE models, we see that the ACE model does not fit significantly better than the CE model. This suggests that the additive genetic component may not be a major contributor to LRS in this dataset, and that the variance in LRS may be primarily due to environmental factors. However, it's important to interpret these results in the context of the specific dataset and trait being analyzed, and to consider other factors such as sample size, pedigree structure, and potential confounding variables that could influence the estimates of variance components.