Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection

Research output: Contribution to journalJournal articleResearchpeer-review

Codon-based substitution models have been widely used to identify amino acid sites under positive selection in comparative analysis of protein-coding DNA sequences. The nonsynonymous-synonymous substitution rate ratio (dN/dS, denoted {omega}) is used as a measure of selective pressure at the protein level, with {omega} > 1 indicating positive selection. Statistical distributions are used to model the variation in {omega} among sites, allowing a subset of sites to have {omega} > 1 while the rest of the sequence may be under purifying selection with {omega} < 1. An empirical Bayes (EB) approach is then used to calculate posterior probabilities that a site comes from the site class with {omega} > 1. Current implementations, however, use the naive EB (NEB) approach and fail to account for sampling errors in maximum likelihood estimates of model parameters, such as the proportions and {omega} ratios for the site classes. In small data sets lacking information, this approach may lead to unreliable posterior probability calculations. In this paper, we develop a Bayes empirical Bayes (BEB) approach to the problem, which assigns a prior to the model parameters and integrates over their uncertainties. We compare the new and old methods on real and simulated data sets. The results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach, while in large data sets it retains the good power of the NEB approach for inferring positively selected sites.
Original languageEnglish
JournalMolecular Biology and Evolution
Volume22
Issue number4
Pages (from-to)1107-1118
ISSN0737-4038
DOIs
Publication statusPublished - 2005

Bibliographical note

Key Words: positive selection • codon-substitution models • Bayes empirical Bayes

ID: 87244