DistAngsd: Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

DistAngsd : Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data. / Zhao, Lei; Nielsen, Rasmus; Korneliussen, Thorfinn Sand.

In: Molecular Biology and Evolution, Vol. 39, No. 6, msac119, 2022.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Zhao, L, Nielsen, R & Korneliussen, TS 2022, 'DistAngsd: Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data', Molecular Biology and Evolution, vol. 39, no. 6, msac119. https://doi.org/10.1093/molbev/msac119

APA

Zhao, L., Nielsen, R., & Korneliussen, T. S. (2022). DistAngsd: Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data. Molecular Biology and Evolution, 39(6), [msac119]. https://doi.org/10.1093/molbev/msac119

Vancouver

Zhao L, Nielsen R, Korneliussen TS. DistAngsd: Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data. Molecular Biology and Evolution. 2022;39(6). msac119. https://doi.org/10.1093/molbev/msac119

Author

Zhao, Lei ; Nielsen, Rasmus ; Korneliussen, Thorfinn Sand. / DistAngsd : Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data. In: Molecular Biology and Evolution. 2022 ; Vol. 39, No. 6.

Bibtex

@article{b48280cf98484afdbbf2107e73296ea7,
title = "DistAngsd: Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data",
abstract = "Commonly used methods for inferring phylogenies were designed before the emergence of high-Throughput sequencing and can generally not accommodate the challenges associated with noisy, diploid sequencing data. In many applications, diploid genomes are still treated as haploid through the use of ambiguity characters; while the uncertainty in genotype calling-Arising as a consequence of the sequencing technology-is ignored. In order to address this problem, we describe two new probabilistic approaches for estimating genetic distances: distAngsd-geno and distAngsd-nuc, both implemented in a software suite named distAngsd. These methods are specifically designed for next-generation sequencing data, utilize the full information from the data, and take uncertainty in genotype calling into account. Through extensive simulations, we show that these new methods are markedly more accurate and have more stable statistical behaviors than other currently available methods for estimating genetic distances-even for very low depth data with high error rates. ",
keywords = "Expectation maximization, Genetic distance, Genotype likelihood, High-Throughput sequencing, Maximum likelihood, Molecular evolution, Next-generation sequencing, Phylogeny reconstruction",
author = "Lei Zhao and Rasmus Nielsen and Korneliussen, {Thorfinn Sand}",
note = "Publisher Copyright: {\textcopyright} 2022 The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.",
year = "2022",
doi = "10.1093/molbev/msac119",
language = "English",
volume = "39",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "6",

}

RIS

TY - JOUR

T1 - DistAngsd

T2 - Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data

AU - Zhao, Lei

AU - Nielsen, Rasmus

AU - Korneliussen, Thorfinn Sand

N1 - Publisher Copyright: © 2022 The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.

PY - 2022

Y1 - 2022

N2 - Commonly used methods for inferring phylogenies were designed before the emergence of high-Throughput sequencing and can generally not accommodate the challenges associated with noisy, diploid sequencing data. In many applications, diploid genomes are still treated as haploid through the use of ambiguity characters; while the uncertainty in genotype calling-Arising as a consequence of the sequencing technology-is ignored. In order to address this problem, we describe two new probabilistic approaches for estimating genetic distances: distAngsd-geno and distAngsd-nuc, both implemented in a software suite named distAngsd. These methods are specifically designed for next-generation sequencing data, utilize the full information from the data, and take uncertainty in genotype calling into account. Through extensive simulations, we show that these new methods are markedly more accurate and have more stable statistical behaviors than other currently available methods for estimating genetic distances-even for very low depth data with high error rates.

AB - Commonly used methods for inferring phylogenies were designed before the emergence of high-Throughput sequencing and can generally not accommodate the challenges associated with noisy, diploid sequencing data. In many applications, diploid genomes are still treated as haploid through the use of ambiguity characters; while the uncertainty in genotype calling-Arising as a consequence of the sequencing technology-is ignored. In order to address this problem, we describe two new probabilistic approaches for estimating genetic distances: distAngsd-geno and distAngsd-nuc, both implemented in a software suite named distAngsd. These methods are specifically designed for next-generation sequencing data, utilize the full information from the data, and take uncertainty in genotype calling into account. Through extensive simulations, we show that these new methods are markedly more accurate and have more stable statistical behaviors than other currently available methods for estimating genetic distances-even for very low depth data with high error rates.

KW - Expectation maximization

KW - Genetic distance

KW - Genotype likelihood

KW - High-Throughput sequencing

KW - Maximum likelihood

KW - Molecular evolution

KW - Next-generation sequencing

KW - Phylogeny reconstruction

U2 - 10.1093/molbev/msac119

DO - 10.1093/molbev/msac119

M3 - Journal article

C2 - 35647675

AN - SCOPUS:85133102906

VL - 39

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 6

M1 - msac119

ER -

ID: 315860700