Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection. / Yang, Ziheng; Wong, Wendy Shuk Wan; Nielsen, Rasmus.

In: Molecular Biology and Evolution, Vol. 22, No. 4, 2005, p. 1107-1118.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Yang, Z, Wong, WSW & Nielsen, R 2005, 'Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection', Molecular Biology and Evolution, vol. 22, no. 4, pp. 1107-1118. https://doi.org/10.1093/molbev/msi097

APA

Yang, Z., Wong, W. S. W., & Nielsen, R. (2005). Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection. Molecular Biology and Evolution, 22(4), 1107-1118. https://doi.org/10.1093/molbev/msi097

Vancouver

Yang Z, Wong WSW, Nielsen R. Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection. Molecular Biology and Evolution. 2005;22(4):1107-1118. https://doi.org/10.1093/molbev/msi097

Author

Yang, Ziheng ; Wong, Wendy Shuk Wan ; Nielsen, Rasmus. / Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection. In: Molecular Biology and Evolution. 2005 ; Vol. 22, No. 4. pp. 1107-1118.

Bibtex

@article{23d57cc074c311dbbee902004c4f4f50,
title = "Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection",
abstract = "Codon-based substitution models have been widely used to identify amino acid sites under positive selection in comparative analysis of protein-coding DNA sequences. The nonsynonymous-synonymous substitution rate ratio (dN/dS, denoted ) is used as a measure of selective pressure at the protein level, with > 1 indicating positive selection. Statistical distributions are used to model the variation in among sites, allowing a subset of sites to have > 1 while the rest of the sequence may be under purifying selection with < 1. An empirical Bayes (EB) approach is then used to calculate posterior probabilities that a site comes from the site class with > 1. Current implementations, however, use the naive EB (NEB) approach and fail to account for sampling errors in maximum likelihood estimates of model parameters, such as the proportions and ratios for the site classes. In small data sets lacking information, this approach may lead to unreliable posterior probability calculations. In this paper, we develop a Bayes empirical Bayes (BEB) approach to the problem, which assigns a prior to the model parameters and integrates over their uncertainties. We compare the new and old methods on real and simulated data sets. The results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach, while in large data sets it retains the good power of the NEB approach for inferring positively selected sites. ",
author = "Ziheng Yang and Wong, {Wendy Shuk Wan} and Rasmus Nielsen",
note = "Key Words: positive selection • codon-substitution models • Bayes empirical Bayes",
year = "2005",
doi = "10.1093/molbev/msi097",
language = "English",
volume = "22",
pages = "1107--1118",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "4",

}

RIS

TY - JOUR

T1 - Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection

AU - Yang, Ziheng

AU - Wong, Wendy Shuk Wan

AU - Nielsen, Rasmus

N1 - Key Words: positive selection • codon-substitution models • Bayes empirical Bayes

PY - 2005

Y1 - 2005

N2 - Codon-based substitution models have been widely used to identify amino acid sites under positive selection in comparative analysis of protein-coding DNA sequences. The nonsynonymous-synonymous substitution rate ratio (dN/dS, denoted ) is used as a measure of selective pressure at the protein level, with > 1 indicating positive selection. Statistical distributions are used to model the variation in among sites, allowing a subset of sites to have > 1 while the rest of the sequence may be under purifying selection with < 1. An empirical Bayes (EB) approach is then used to calculate posterior probabilities that a site comes from the site class with > 1. Current implementations, however, use the naive EB (NEB) approach and fail to account for sampling errors in maximum likelihood estimates of model parameters, such as the proportions and ratios for the site classes. In small data sets lacking information, this approach may lead to unreliable posterior probability calculations. In this paper, we develop a Bayes empirical Bayes (BEB) approach to the problem, which assigns a prior to the model parameters and integrates over their uncertainties. We compare the new and old methods on real and simulated data sets. The results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach, while in large data sets it retains the good power of the NEB approach for inferring positively selected sites.

AB - Codon-based substitution models have been widely used to identify amino acid sites under positive selection in comparative analysis of protein-coding DNA sequences. The nonsynonymous-synonymous substitution rate ratio (dN/dS, denoted ) is used as a measure of selective pressure at the protein level, with > 1 indicating positive selection. Statistical distributions are used to model the variation in among sites, allowing a subset of sites to have > 1 while the rest of the sequence may be under purifying selection with < 1. An empirical Bayes (EB) approach is then used to calculate posterior probabilities that a site comes from the site class with > 1. Current implementations, however, use the naive EB (NEB) approach and fail to account for sampling errors in maximum likelihood estimates of model parameters, such as the proportions and ratios for the site classes. In small data sets lacking information, this approach may lead to unreliable posterior probability calculations. In this paper, we develop a Bayes empirical Bayes (BEB) approach to the problem, which assigns a prior to the model parameters and integrates over their uncertainties. We compare the new and old methods on real and simulated data sets. The results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach, while in large data sets it retains the good power of the NEB approach for inferring positively selected sites.

U2 - 10.1093/molbev/msi097

DO - 10.1093/molbev/msi097

M3 - Journal article

C2 - 15689528

VL - 22

SP - 1107

EP - 1118

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 4

ER -

ID: 87244