Looking for Darwin in Genomic Sequences-Validity and Success of Statistical Methods
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Looking for Darwin in Genomic Sequences-Validity and Success of Statistical Methods. / Zhai, Weiwei; Nielsen, Rasmus; Goldman, Nick; Yang, Ziheng.
In: Molecular Biology and Evolution, Vol. 29, No. 10, 2012, p. 2889-2893.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Looking for Darwin in Genomic Sequences-Validity and Success of Statistical Methods
AU - Zhai, Weiwei
AU - Nielsen, Rasmus
AU - Goldman, Nick
AU - Yang, Ziheng
PY - 2012
Y1 - 2012
N2 - The use of codon substitution models to compare synonymous and nonsynonymous substitution rates is a widely used approach to detecting positive Darwinian selection affecting protein evolution. However, in several recent papers, Hughes and colleagues claim that codon-based likelihood-ratio tests (LRTs) are logically flawed as they lack prior hypotheses and fail to accommodate random fluctuations in synonymous and nonsynonymous substitutions Friedman and Hughes (2007) also used site-based LRTs to analyze 605 gene families consisting of human and mouse paralogues. They found that the outcome of the tests was largely determined by irrelevant factors such as the GC content at the third codon positions and the synonymous rate d(S), but not by the nonsynonymous rate d(N) or the d(N)/d(S) ratio, factors that should be related to selection. Here, we reanalyze those data. Contra Friedman and Hughes, we found that the test results are related to sequence length and the average d(N)/d(S) ratio. We examine the criticisms of Hughes and suggest that they are based on misunderstandings of the codon models and on statistical errors. Our analyses suggest that codon-based tests are useful tools for comparative analysis of genomic data sets.
AB - The use of codon substitution models to compare synonymous and nonsynonymous substitution rates is a widely used approach to detecting positive Darwinian selection affecting protein evolution. However, in several recent papers, Hughes and colleagues claim that codon-based likelihood-ratio tests (LRTs) are logically flawed as they lack prior hypotheses and fail to accommodate random fluctuations in synonymous and nonsynonymous substitutions Friedman and Hughes (2007) also used site-based LRTs to analyze 605 gene families consisting of human and mouse paralogues. They found that the outcome of the tests was largely determined by irrelevant factors such as the GC content at the third codon positions and the synonymous rate d(S), but not by the nonsynonymous rate d(N) or the d(N)/d(S) ratio, factors that should be related to selection. Here, we reanalyze those data. Contra Friedman and Hughes, we found that the test results are related to sequence length and the average d(N)/d(S) ratio. We examine the criticisms of Hughes and suggest that they are based on misunderstandings of the codon models and on statistical errors. Our analyses suggest that codon-based tests are useful tools for comparative analysis of genomic data sets.
KW - codon model
KW - Darwinian selection
KW - likelihood-ratio test
U2 - 10.1093/molbev/mss104
DO - 10.1093/molbev/mss104
M3 - Journal article
C2 - 22490825
VL - 29
SP - 2889
EP - 2893
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
SN - 0737-4038
IS - 10
ER -
ID: 49695893