Evaluation of methods for estimating coalescence times using ancestral recombination graphs

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Evaluation of methods for estimating coalescence times using ancestral recombination graphs. / Y C Brandt, Débora; Wei, Xinzhu; Deng, Yun; Vaughn, Andrew H.; Nielsen, Rasmus.

In: Genetics, Vol. 221, No. 1, iyac044, 2022.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Y C Brandt, D, Wei, X, Deng, Y, Vaughn, AH & Nielsen, R 2022, 'Evaluation of methods for estimating coalescence times using ancestral recombination graphs', Genetics, vol. 221, no. 1, iyac044. https://doi.org/10.1093/genetics/iyac044

APA

Y C Brandt, D., Wei, X., Deng, Y., Vaughn, A. H., & Nielsen, R. (2022). Evaluation of methods for estimating coalescence times using ancestral recombination graphs. Genetics, 221(1), [iyac044]. https://doi.org/10.1093/genetics/iyac044

Vancouver

Y C Brandt D, Wei X, Deng Y, Vaughn AH, Nielsen R. Evaluation of methods for estimating coalescence times using ancestral recombination graphs. Genetics. 2022;221(1). iyac044. https://doi.org/10.1093/genetics/iyac044

Author

Y C Brandt, Débora ; Wei, Xinzhu ; Deng, Yun ; Vaughn, Andrew H. ; Nielsen, Rasmus. / Evaluation of methods for estimating coalescence times using ancestral recombination graphs. In: Genetics. 2022 ; Vol. 221, No. 1.

Bibtex

@article{6b09e87d7f1d411289bba2fc8e58eda9,
title = "Evaluation of methods for estimating coalescence times using ancestral recombination graphs",
abstract = "The ancestral recombination graph is a structure that describes the joint genealogies of sampled DNA sequences along the genome. Recent computational methods have made impressive progress toward scalably estimating whole-genome genealogies. In addition to inferring the ancestral recombination graph, some of these methods can also provide ancestral recombination graphs sampled from a defined posterior distribution. Obtaining good samples of ancestral recombination graphs is crucial for quantifying statistical uncertainty and for estimating population genetic parameters such as effective population size, mutation rate, and allele age. Here, we use standard neutral coalescent simulations to benchmark the estimates of pairwise coalescence times from 3 popular ancestral recombination graph inference programs: ARGweaver, Relate, and tsinfer+tsdate. We compare (1) the true coalescence times to the inferred times at each locus; (2) the distribution of coalescence times across all loci to the expected exponential distribution; (3) whether the sampled coalescence times have the properties expected of a valid posterior distribution. We find that inferred coalescence times at each locus are most accurate in ARGweaver, and often more accurate in Relate than in tsinfer+tsdate. However, all 3 methods tend to overestimate small coalescence times and underestimate large ones. Lastly, the posterior distribution of ARGweaver is closer to the expected posterior distribution than Relate's, but this higher accuracy comes at a substantial trade-off in scalability. The best choice of method will depend on the number and length of input sequences and on the goal of downstream analyses, and we provide guidelines for the best practices.",
keywords = "ancestral recombination graph, ARGweaver, calibration, Relate, simulation, tsdate, tsinfer",
author = "{Y C Brandt}, D{\'e}bora and Xinzhu Wei and Yun Deng and Vaughn, {Andrew H.} and Rasmus Nielsen",
note = "Publisher Copyright: {\textcopyright} The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America. All rights reserved. For permissions, please email: journals.permissions@oup.com.",
year = "2022",
doi = "10.1093/genetics/iyac044",
language = "English",
volume = "221",
journal = "Genetics",
issn = "1943-2631",
publisher = "The Genetics Society of America (GSA)",
number = "1",

}

RIS

TY - JOUR

T1 - Evaluation of methods for estimating coalescence times using ancestral recombination graphs

AU - Y C Brandt, Débora

AU - Wei, Xinzhu

AU - Deng, Yun

AU - Vaughn, Andrew H.

AU - Nielsen, Rasmus

N1 - Publisher Copyright: © The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America. All rights reserved. For permissions, please email: journals.permissions@oup.com.

PY - 2022

Y1 - 2022

N2 - The ancestral recombination graph is a structure that describes the joint genealogies of sampled DNA sequences along the genome. Recent computational methods have made impressive progress toward scalably estimating whole-genome genealogies. In addition to inferring the ancestral recombination graph, some of these methods can also provide ancestral recombination graphs sampled from a defined posterior distribution. Obtaining good samples of ancestral recombination graphs is crucial for quantifying statistical uncertainty and for estimating population genetic parameters such as effective population size, mutation rate, and allele age. Here, we use standard neutral coalescent simulations to benchmark the estimates of pairwise coalescence times from 3 popular ancestral recombination graph inference programs: ARGweaver, Relate, and tsinfer+tsdate. We compare (1) the true coalescence times to the inferred times at each locus; (2) the distribution of coalescence times across all loci to the expected exponential distribution; (3) whether the sampled coalescence times have the properties expected of a valid posterior distribution. We find that inferred coalescence times at each locus are most accurate in ARGweaver, and often more accurate in Relate than in tsinfer+tsdate. However, all 3 methods tend to overestimate small coalescence times and underestimate large ones. Lastly, the posterior distribution of ARGweaver is closer to the expected posterior distribution than Relate's, but this higher accuracy comes at a substantial trade-off in scalability. The best choice of method will depend on the number and length of input sequences and on the goal of downstream analyses, and we provide guidelines for the best practices.

AB - The ancestral recombination graph is a structure that describes the joint genealogies of sampled DNA sequences along the genome. Recent computational methods have made impressive progress toward scalably estimating whole-genome genealogies. In addition to inferring the ancestral recombination graph, some of these methods can also provide ancestral recombination graphs sampled from a defined posterior distribution. Obtaining good samples of ancestral recombination graphs is crucial for quantifying statistical uncertainty and for estimating population genetic parameters such as effective population size, mutation rate, and allele age. Here, we use standard neutral coalescent simulations to benchmark the estimates of pairwise coalescence times from 3 popular ancestral recombination graph inference programs: ARGweaver, Relate, and tsinfer+tsdate. We compare (1) the true coalescence times to the inferred times at each locus; (2) the distribution of coalescence times across all loci to the expected exponential distribution; (3) whether the sampled coalescence times have the properties expected of a valid posterior distribution. We find that inferred coalescence times at each locus are most accurate in ARGweaver, and often more accurate in Relate than in tsinfer+tsdate. However, all 3 methods tend to overestimate small coalescence times and underestimate large ones. Lastly, the posterior distribution of ARGweaver is closer to the expected posterior distribution than Relate's, but this higher accuracy comes at a substantial trade-off in scalability. The best choice of method will depend on the number and length of input sequences and on the goal of downstream analyses, and we provide guidelines for the best practices.

KW - ancestral recombination graph

KW - ARGweaver

KW - calibration

KW - Relate

KW - simulation

KW - tsdate

KW - tsinfer

U2 - 10.1093/genetics/iyac044

DO - 10.1093/genetics/iyac044

M3 - Journal article

C2 - 35333304

AN - SCOPUS:85129997792

VL - 221

JO - Genetics

JF - Genetics

SN - 1943-2631

IS - 1

M1 - iyac044

ER -

ID: 307102258