DistAngsd: Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
DistAngsd : Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data. / Zhao, Lei; Nielsen, Rasmus; Korneliussen, Thorfinn Sand.
In: Molecular Biology and Evolution, Vol. 39, No. 6, msac119, 2022.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - DistAngsd
T2 - Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data
AU - Zhao, Lei
AU - Nielsen, Rasmus
AU - Korneliussen, Thorfinn Sand
N1 - Publisher Copyright: © 2022 The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.
PY - 2022
Y1 - 2022
N2 - Commonly used methods for inferring phylogenies were designed before the emergence of high-Throughput sequencing and can generally not accommodate the challenges associated with noisy, diploid sequencing data. In many applications, diploid genomes are still treated as haploid through the use of ambiguity characters; while the uncertainty in genotype calling-Arising as a consequence of the sequencing technology-is ignored. In order to address this problem, we describe two new probabilistic approaches for estimating genetic distances: distAngsd-geno and distAngsd-nuc, both implemented in a software suite named distAngsd. These methods are specifically designed for next-generation sequencing data, utilize the full information from the data, and take uncertainty in genotype calling into account. Through extensive simulations, we show that these new methods are markedly more accurate and have more stable statistical behaviors than other currently available methods for estimating genetic distances-even for very low depth data with high error rates.
AB - Commonly used methods for inferring phylogenies were designed before the emergence of high-Throughput sequencing and can generally not accommodate the challenges associated with noisy, diploid sequencing data. In many applications, diploid genomes are still treated as haploid through the use of ambiguity characters; while the uncertainty in genotype calling-Arising as a consequence of the sequencing technology-is ignored. In order to address this problem, we describe two new probabilistic approaches for estimating genetic distances: distAngsd-geno and distAngsd-nuc, both implemented in a software suite named distAngsd. These methods are specifically designed for next-generation sequencing data, utilize the full information from the data, and take uncertainty in genotype calling into account. Through extensive simulations, we show that these new methods are markedly more accurate and have more stable statistical behaviors than other currently available methods for estimating genetic distances-even for very low depth data with high error rates.
KW - Expectation maximization
KW - Genetic distance
KW - Genotype likelihood
KW - High-Throughput sequencing
KW - Maximum likelihood
KW - Molecular evolution
KW - Next-generation sequencing
KW - Phylogeny reconstruction
U2 - 10.1093/molbev/msac119
DO - 10.1093/molbev/msac119
M3 - Journal article
C2 - 35647675
AN - SCOPUS:85133102906
VL - 39
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
SN - 0737-4038
IS - 6
M1 - msac119
ER -
ID: 315860700