Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process. / Ramírez-Soriano, Anna; Nielsen, Rasmus.
In: Genetics, Vol. 181, No. 2, 2009, p. 701-10.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process
AU - Ramírez-Soriano, Anna
AU - Nielsen, Rasmus
N1 - Keywords: Alleles; Analysis of Variance; Bias (Epidemiology); Biometry; Computer Simulation; Databases, Genetic; Genetics, Population; Genome, Human; Genotype; Humans; Models, Genetic; Mutation; Polymorphism, Single Nucleotide
PY - 2009
Y1 - 2009
N2 - Most single-nucleotide polymorphism (SNP) data suffer from an ascertainment bias caused by the process of SNP discovery followed by SNP genotyping. The final genotyped data are biased toward an excess of common alleles compared to directly sequenced data, making standard genetic methods of analysis inapplicable to this type of data. We here derive corrected estimators of the fundamental population genetic parameter = 4N(e)mu (N(e), effective population size; mu, mutation rate) on the basis of the average number of pairwise differences and on the basis of the number of segregating sites. We also derive the variances and covariances of these estimators and provide a corrected version of Tajima's D statistic. We reanalyze a human genomewide SNP data set and find substantial differences in the results with or without ascertainment bias correction.
AB - Most single-nucleotide polymorphism (SNP) data suffer from an ascertainment bias caused by the process of SNP discovery followed by SNP genotyping. The final genotyped data are biased toward an excess of common alleles compared to directly sequenced data, making standard genetic methods of analysis inapplicable to this type of data. We here derive corrected estimators of the fundamental population genetic parameter = 4N(e)mu (N(e), effective population size; mu, mutation rate) on the basis of the average number of pairwise differences and on the basis of the number of segregating sites. We also derive the variances and covariances of these estimators and provide a corrected version of Tajima's D statistic. We reanalyze a human genomewide SNP data set and find substantial differences in the results with or without ascertainment bias correction.
U2 - 10.1534/genetics.108.094060
DO - 10.1534/genetics.108.094060
M3 - Journal article
C2 - 19087964
VL - 181
SP - 701
EP - 710
JO - Genetics
JF - Genetics
SN - 1943-2631
IS - 2
ER -
ID: 21332666