Ascertainment bias in studies of human genome-wide polymorphism

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Ascertainment bias in studies of human genome-wide polymorphism. / Clark, Andrew G.; Hubisz, Melissa J.; Bustamente, Carlos D.; Williamson, Scott H.; Nielsen, Rasmus.

In: Genome Research, Vol. 15, No. 11, 2005, p. 1496-1502.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Clark, AG, Hubisz, MJ, Bustamente, CD, Williamson, SH & Nielsen, R 2005, 'Ascertainment bias in studies of human genome-wide polymorphism', Genome Research, vol. 15, no. 11, pp. 1496-1502. https://doi.org/10.1101/gr.4107905

APA

Clark, A. G., Hubisz, M. J., Bustamente, C. D., Williamson, S. H., & Nielsen, R. (2005). Ascertainment bias in studies of human genome-wide polymorphism. Genome Research, 15(11), 1496-1502. https://doi.org/10.1101/gr.4107905

Vancouver

Clark AG, Hubisz MJ, Bustamente CD, Williamson SH, Nielsen R. Ascertainment bias in studies of human genome-wide polymorphism. Genome Research. 2005;15(11):1496-1502. https://doi.org/10.1101/gr.4107905

Author

Clark, Andrew G. ; Hubisz, Melissa J. ; Bustamente, Carlos D. ; Williamson, Scott H. ; Nielsen, Rasmus. / Ascertainment bias in studies of human genome-wide polymorphism. In: Genome Research. 2005 ; Vol. 15, No. 11. pp. 1496-1502.

Bibtex

@article{215d50d074c311dbbee902004c4f4f50,
title = "Ascertainment bias in studies of human genome-wide polymorphism",
abstract = "Large-scale SNP genotyping studies rely on an initial assessment of nucleotide variation to identify sites in the DNA sequence that harbor variation among individuals. This {"}SNP discovery{"} sample may be quite variable in size and composition, and it has been well established that properties of the SNPs that are found are influenced by the discovery sampling effort. The International HapMap project relied on nearly any piece of information available to identify SNPs-including BAC end sequences, shotgun reads, and differences between public and private sequences-and even made use of chimpanzee data to confirm human sequence differences. In addition, the ascertainment criteria shifted from using only SNPs that had been validated in population samples, to double-hit SNPs, to finally accepting SNPs that were singletons in small discovery samples. In contrast, Perlegen's primary discovery was a resequencing-by-hybridization effort using the 24 people of diverse origin in the Polymorphism Discovery Resource. Here we take these two data sets and contrast two basic summary statistics, heterozygosity and FST, as well as the site frequency spectra, for 500-kb windows spanning the genome. The magnitude of disparity between these samples in these measures of variability indicates that population genetic analysis on the raw genotype data is ill advised. Given the knowledge of the discovery samples, we perform an ascertainment correction and show how the post-correction data are more consistent across these studies. However, discrepancies persist, suggesting that the heterogeneity in the SNP discovery process of the HapMap project resulted in a data set resistant to complete ascertainment correction. Ascertainment bias will likely erode the power of tests of association between SNPs and complex disorders, but the effect will likely be small, and perhaps more importantly, it is unlikely that the bias will introduce false-positive inferences.",
author = "Clark, {Andrew G.} and Hubisz, {Melissa J.} and Bustamente, {Carlos D.} and Williamson, {Scott H.} and Rasmus Nielsen",
year = "2005",
doi = "10.1101/gr.4107905",
language = "English",
volume = "15",
pages = "1496--1502",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "11",

}

RIS

TY - JOUR

T1 - Ascertainment bias in studies of human genome-wide polymorphism

AU - Clark, Andrew G.

AU - Hubisz, Melissa J.

AU - Bustamente, Carlos D.

AU - Williamson, Scott H.

AU - Nielsen, Rasmus

PY - 2005

Y1 - 2005

N2 - Large-scale SNP genotyping studies rely on an initial assessment of nucleotide variation to identify sites in the DNA sequence that harbor variation among individuals. This "SNP discovery" sample may be quite variable in size and composition, and it has been well established that properties of the SNPs that are found are influenced by the discovery sampling effort. The International HapMap project relied on nearly any piece of information available to identify SNPs-including BAC end sequences, shotgun reads, and differences between public and private sequences-and even made use of chimpanzee data to confirm human sequence differences. In addition, the ascertainment criteria shifted from using only SNPs that had been validated in population samples, to double-hit SNPs, to finally accepting SNPs that were singletons in small discovery samples. In contrast, Perlegen's primary discovery was a resequencing-by-hybridization effort using the 24 people of diverse origin in the Polymorphism Discovery Resource. Here we take these two data sets and contrast two basic summary statistics, heterozygosity and FST, as well as the site frequency spectra, for 500-kb windows spanning the genome. The magnitude of disparity between these samples in these measures of variability indicates that population genetic analysis on the raw genotype data is ill advised. Given the knowledge of the discovery samples, we perform an ascertainment correction and show how the post-correction data are more consistent across these studies. However, discrepancies persist, suggesting that the heterogeneity in the SNP discovery process of the HapMap project resulted in a data set resistant to complete ascertainment correction. Ascertainment bias will likely erode the power of tests of association between SNPs and complex disorders, but the effect will likely be small, and perhaps more importantly, it is unlikely that the bias will introduce false-positive inferences.

AB - Large-scale SNP genotyping studies rely on an initial assessment of nucleotide variation to identify sites in the DNA sequence that harbor variation among individuals. This "SNP discovery" sample may be quite variable in size and composition, and it has been well established that properties of the SNPs that are found are influenced by the discovery sampling effort. The International HapMap project relied on nearly any piece of information available to identify SNPs-including BAC end sequences, shotgun reads, and differences between public and private sequences-and even made use of chimpanzee data to confirm human sequence differences. In addition, the ascertainment criteria shifted from using only SNPs that had been validated in population samples, to double-hit SNPs, to finally accepting SNPs that were singletons in small discovery samples. In contrast, Perlegen's primary discovery was a resequencing-by-hybridization effort using the 24 people of diverse origin in the Polymorphism Discovery Resource. Here we take these two data sets and contrast two basic summary statistics, heterozygosity and FST, as well as the site frequency spectra, for 500-kb windows spanning the genome. The magnitude of disparity between these samples in these measures of variability indicates that population genetic analysis on the raw genotype data is ill advised. Given the knowledge of the discovery samples, we perform an ascertainment correction and show how the post-correction data are more consistent across these studies. However, discrepancies persist, suggesting that the heterogeneity in the SNP discovery process of the HapMap project resulted in a data set resistant to complete ascertainment correction. Ascertainment bias will likely erode the power of tests of association between SNPs and complex disorders, but the effect will likely be small, and perhaps more importantly, it is unlikely that the bias will introduce false-positive inferences.

U2 - 10.1101/gr.4107905

DO - 10.1101/gr.4107905

M3 - Journal article

C2 - 16251459

VL - 15

SP - 1496

EP - 1502

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 11

ER -

ID: 87159