A Bayesian Framework for Inferring the Influence of Sequence Context on Point Mutations

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

A Bayesian Framework for Inferring the Influence of Sequence Context on Point Mutations. / Ling, Guy; Miller, Danielle; Nielsen, Rasmus; Stern, Adi.

In: Molecular Biology and Evolution, Vol. 37, No. 3, 2020, p. 893-903.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Ling, G, Miller, D, Nielsen, R & Stern, A 2020, 'A Bayesian Framework for Inferring the Influence of Sequence Context on Point Mutations', Molecular Biology and Evolution, vol. 37, no. 3, pp. 893-903. https://doi.org/10.1093/molbev/msz248

APA

Ling, G., Miller, D., Nielsen, R., & Stern, A. (2020). A Bayesian Framework for Inferring the Influence of Sequence Context on Point Mutations. Molecular Biology and Evolution, 37(3), 893-903. https://doi.org/10.1093/molbev/msz248

Vancouver

Ling G, Miller D, Nielsen R, Stern A. A Bayesian Framework for Inferring the Influence of Sequence Context on Point Mutations. Molecular Biology and Evolution. 2020;37(3):893-903. https://doi.org/10.1093/molbev/msz248

Author

Ling, Guy ; Miller, Danielle ; Nielsen, Rasmus ; Stern, Adi. / A Bayesian Framework for Inferring the Influence of Sequence Context on Point Mutations. In: Molecular Biology and Evolution. 2020 ; Vol. 37, No. 3. pp. 893-903.

Bibtex

@article{8d5449a22877496890d3b267bc4f3713,
title = "A Bayesian Framework for Inferring the Influence of Sequence Context on Point Mutations",
abstract = "The probability of point mutations is expected to be highly influenced by the flanking nucleotides that surround them, known as the sequence context. This phenomenon may be mainly attributed to the enzyme that modifies or mutates the genetic material, because most enzymes tend to have specific sequence contexts that dictate their activity. Here, we develop a statistical model that allows for the detection and evaluation of the effects of different sequence contexts on mutation rates from deep population sequencing data. This task is computationally challenging, as the complexity of the model increases exponentially as the context size increases. We established our novel Bayesian method based on sparse model selection methods, with the leading assumption that the number of actual sequence contexts that directly influence mutation rates is minuscule compared with the number of possible sequence contexts. We show that our method is highly accurate on simulated data using pentanucleotide contexts, even when accounting for noisy data. We next analyze empirical population sequencing data from polioviruses and HIV-1 and detect a significant enrichment in sequence contexts associated with deamination by the cellular deaminases ADAR 1/2 and APOBEC3G, respectively. In the current era, where next-generation sequencing data are highly abundant, our approach can be used on any population sequencing data to reveal context-dependent base alterations and may assist in the discovery of novel mutable sites or editing sites.",
keywords = "evolutionary model, MCMC, mutation rates, population genetics, sequence context",
author = "Guy Ling and Danielle Miller and Rasmus Nielsen and Adi Stern",
note = "Publisher Copyright: {\textcopyright} 2019 The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.",
year = "2020",
doi = "10.1093/molbev/msz248",
language = "English",
volume = "37",
pages = "893--903",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "3",

}

RIS

TY - JOUR

T1 - A Bayesian Framework for Inferring the Influence of Sequence Context on Point Mutations

AU - Ling, Guy

AU - Miller, Danielle

AU - Nielsen, Rasmus

AU - Stern, Adi

N1 - Publisher Copyright: © 2019 The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

PY - 2020

Y1 - 2020

N2 - The probability of point mutations is expected to be highly influenced by the flanking nucleotides that surround them, known as the sequence context. This phenomenon may be mainly attributed to the enzyme that modifies or mutates the genetic material, because most enzymes tend to have specific sequence contexts that dictate their activity. Here, we develop a statistical model that allows for the detection and evaluation of the effects of different sequence contexts on mutation rates from deep population sequencing data. This task is computationally challenging, as the complexity of the model increases exponentially as the context size increases. We established our novel Bayesian method based on sparse model selection methods, with the leading assumption that the number of actual sequence contexts that directly influence mutation rates is minuscule compared with the number of possible sequence contexts. We show that our method is highly accurate on simulated data using pentanucleotide contexts, even when accounting for noisy data. We next analyze empirical population sequencing data from polioviruses and HIV-1 and detect a significant enrichment in sequence contexts associated with deamination by the cellular deaminases ADAR 1/2 and APOBEC3G, respectively. In the current era, where next-generation sequencing data are highly abundant, our approach can be used on any population sequencing data to reveal context-dependent base alterations and may assist in the discovery of novel mutable sites or editing sites.

AB - The probability of point mutations is expected to be highly influenced by the flanking nucleotides that surround them, known as the sequence context. This phenomenon may be mainly attributed to the enzyme that modifies or mutates the genetic material, because most enzymes tend to have specific sequence contexts that dictate their activity. Here, we develop a statistical model that allows for the detection and evaluation of the effects of different sequence contexts on mutation rates from deep population sequencing data. This task is computationally challenging, as the complexity of the model increases exponentially as the context size increases. We established our novel Bayesian method based on sparse model selection methods, with the leading assumption that the number of actual sequence contexts that directly influence mutation rates is minuscule compared with the number of possible sequence contexts. We show that our method is highly accurate on simulated data using pentanucleotide contexts, even when accounting for noisy data. We next analyze empirical population sequencing data from polioviruses and HIV-1 and detect a significant enrichment in sequence contexts associated with deamination by the cellular deaminases ADAR 1/2 and APOBEC3G, respectively. In the current era, where next-generation sequencing data are highly abundant, our approach can be used on any population sequencing data to reveal context-dependent base alterations and may assist in the discovery of novel mutable sites or editing sites.

KW - evolutionary model

KW - MCMC

KW - mutation rates

KW - population genetics

KW - sequence context

U2 - 10.1093/molbev/msz248

DO - 10.1093/molbev/msz248

M3 - Journal article

C2 - 31651955

AN - SCOPUS:85081100928

VL - 37

SP - 893

EP - 903

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 3

ER -

ID: 336610192