Statistical Methods for Population Genetic Inference Based on Low-Depth Sequencing Data from Modern and Ancient DNA
Research output: Book/Report › Ph.D. thesis › Research
Standard
Statistical Methods for Population Genetic Inference Based on Low-Depth Sequencing Data from Modern and Ancient DNA. / Korneliussen, Thorfinn Sand.
Natural History Museum of Denmark, Faculty of Science, University of Copenhagen, 2015. 109 p.Research output: Book/Report › Ph.D. thesis › Research
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - BOOK
T1 - Statistical Methods for Population Genetic Inference Based on Low-Depth Sequencing Data from Modern and Ancient DNA
AU - Korneliussen, Thorfinn Sand
PY - 2015
Y1 - 2015
N2 - Due to the recent advances in DNA sequencing technology genomic data are being generated at an unprecedented rate and we are gaining access to entire genomes at population level. The technology does, however, not give direct access to the genetic variation and the many levels of preprocessing that is required before being able to make inferences from the data introduces multiple levels of uncertainty, especially for low-depth data. Therefore methods that take into account the inherent uncertainty are needed for being able to make robust inferences in the downstream analysis of such data. This poses a problem for a range of key summary statistics within populations genetics where existing methods are based on the assumption that the true genotypes are known.Motivated by this I present: 1) a new method for the estimation of relatedness between pairs of individuals, 2) a new method for estimating neutrality test statistics, which are commonly used for finding genomic regions that have been under natural selection, 3) a new method for estimating individual admixture proportions, which can be used for finding population structure and 4) a general framework for analysis of high-throughput sequencing data.These methods are all based on the concept of genotype likelihoods, which provides a degree of uncertainty of the data, and we show, both through simulations and with proper high-throughput sequencing data, that for low-depth data our methods outperform existing approaches, which are based on the assumption of known genotypes. The new methods are implemented in fast multi-threaded programs, which have been made freely available to the scientific community and have already been successfully applied in many different studies.
AB - Due to the recent advances in DNA sequencing technology genomic data are being generated at an unprecedented rate and we are gaining access to entire genomes at population level. The technology does, however, not give direct access to the genetic variation and the many levels of preprocessing that is required before being able to make inferences from the data introduces multiple levels of uncertainty, especially for low-depth data. Therefore methods that take into account the inherent uncertainty are needed for being able to make robust inferences in the downstream analysis of such data. This poses a problem for a range of key summary statistics within populations genetics where existing methods are based on the assumption that the true genotypes are known.Motivated by this I present: 1) a new method for the estimation of relatedness between pairs of individuals, 2) a new method for estimating neutrality test statistics, which are commonly used for finding genomic regions that have been under natural selection, 3) a new method for estimating individual admixture proportions, which can be used for finding population structure and 4) a general framework for analysis of high-throughput sequencing data.These methods are all based on the concept of genotype likelihoods, which provides a degree of uncertainty of the data, and we show, both through simulations and with proper high-throughput sequencing data, that for low-depth data our methods outperform existing approaches, which are based on the assumption of known genotypes. The new methods are implemented in fast multi-threaded programs, which have been made freely available to the scientific community and have already been successfully applied in many different studies.
UR - https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122654311305763
M3 - Ph.D. thesis
BT - Statistical Methods for Population Genetic Inference Based on Low-Depth Sequencing Data from Modern and Ancient DNA
PB - Natural History Museum of Denmark, Faculty of Science, University of Copenhagen
ER -
ID: 162938150