|
1 | | -# Generation times |
| 1 | +# Human generation times across the past 250,000 years |
| 2 | + |
| 3 | +The generation times of our recent ancestors can tell us about both the biology and social organization of prehistoric humans, placing human evolution on an absolute timescale. We implement a method for predicting historical male and female generation times based on changes in the mutation spectrum. |
| 4 | + |
| 5 | +Our method combines data from two different types of studies: |
| 6 | +- **Mutations from pedigree studies** <br> |
| 7 | +We apply a Dirichlet-multinomial regression to mutation count data to capture the relationship between the underlying mutation spectrum and parental ages. |
| 8 | +- **Variants from population genetic studies** <br> |
| 9 | +We use human variants from the 1000 Genomes Project with allele ages estimated from the Genealogical Estimation of Variant Age (GEVA) approach. |
| 10 | + |
| 11 | +## Summary of analysis workflow: |
| 12 | +1. Preprocess 1000 Genomes variant data with ages from GEVA |
| 13 | +2. Count binned variants, including for each continental population |
| 14 | +3. Load mutation data and build Dirichlet-multinomial model |
| 15 | +4. Estimate best-fit parental ages for variant spectrum in each bin |
| 16 | + |
| 17 | +## Brief descriptions for folders and files in top-level of repository: |
| 18 | +### folders |
| 19 | +- bootstraps/<br> |
| 20 | +Recalculate estimates for each 100x100 double-bootstrap of model and variants |
| 21 | +- neanderthal_masked/<br> |
| 22 | +Reanalysis masking genomic tracts with potential Neanderthal introgression |
| 23 | +- resample_alleleage/<br> |
| 24 | +Reanalysis after drawing new allele ages based on 95% CI from GEVA |
| 25 | +- var_count/<br> |
| 26 | +Preprocess variant data, bin variants, and count each mutation class |
| 27 | +### files |
| 28 | +* age_modeling.R<br> |
| 29 | +Loads mutation data and builds the probabilistic model for estimating parental ages |
| 30 | +* analyze_main.R<br> |
| 31 | +Analysis script for main plots, depends on age_modeling.R and plot_helper.R |
| 32 | +* analyze_populations.R<br> |
| 33 | +Analysis script for separate continental human populations |
| 34 | +* calculate_SSE.R<br> |
| 35 | +Calculate the sum of squared error (SSE) for generation time estimates |
| 36 | +* cross_validation.R<br> |
| 37 | +Short script for calculating sample variance SSE |
| 38 | +* plot_helper.R<br> |
| 39 | +Auxillary scripts for shaping output and plotting |
| 40 | +* recombination_analysis.R<br> |
| 41 | +Investigation of connection between recombination rate and mutation spectrum |
| 42 | +* sim_famvariance.R<br> |
| 43 | +Simulate variance in parental ages and calculate SSE |
0 commit comments