Dear Author,
I am currently using qpDstat from the Admixtools package to calculate D-statistics among several rodent species. However, I have noticed that many of my results show Z-scores = 100, and most tests have |Z| > 3. I would like to ask whether this pattern is expected or indicates a problem in my setup.
Here are some details about my data and parameters:
Reference genome size: ~2.3 Gb
Number of SNPs used: 278518514,quite large
Parameter file: blgsize: 0.01
Outgroup: a relatively distant species
Many tests were run for all possible quartets (W, X, Y, Z), not only for topologies consistent with the species tree.
My questions are:
Is it normal to get Z = 100 for many tests, or does this indicate numerical saturation (e.g., SE(D) too small)?
Should I increase the block size (e.g., blgsize: 0.05 or larger) to avoid unrealistically small SE values?
Would it be more appropriate to limit the tests to quartets consistent with the species tree, instead of testing all possible combinations?
Could the high Z-scores result from using too distant an outgroup or from excessive divergence among species? In that case, would you recommend restricting the D-statistic tests within clades and choosing the nearest outgroup for each trio?
Any guidance on how to interpret these large Z-scores and how to adjust parameters or filtering strategies would be greatly appreciated.
Thank you very much for your time and for maintaining this excellent tool.
Best regards,
Na Wan
Dear Author,
I am currently using qpDstat from the Admixtools package to calculate D-statistics among several rodent species. However, I have noticed that many of my results show Z-scores = 100, and most tests have |Z| > 3. I would like to ask whether this pattern is expected or indicates a problem in my setup.
Here are some details about my data and parameters:
Reference genome size: ~2.3 Gb
Number of SNPs used: 278518514,quite large
Parameter file: blgsize: 0.01
Outgroup: a relatively distant species
Many tests were run for all possible quartets (W, X, Y, Z), not only for topologies consistent with the species tree.
My questions are:
Is it normal to get Z = 100 for many tests, or does this indicate numerical saturation (e.g., SE(D) too small)?
Should I increase the block size (e.g., blgsize: 0.05 or larger) to avoid unrealistically small SE values?
Would it be more appropriate to limit the tests to quartets consistent with the species tree, instead of testing all possible combinations?
Could the high Z-scores result from using too distant an outgroup or from excessive divergence among species? In that case, would you recommend restricting the D-statistic tests within clades and choosing the nearest outgroup for each trio?
Any guidance on how to interpret these large Z-scores and how to adjust parameters or filtering strategies would be greatly appreciated.
Thank you very much for your time and for maintaining this excellent tool.
Best regards,
Na Wan