Evaluating the role of reference-genome phylogenetic distance on evolutionary inference

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Evaluating the role of reference-genome phylogenetic distance on evolutionary inference. / Prasad, Aparna; Lorenzen, Eline D.; Westbury, Michael V.

In: Molecular Ecology Resources, Vol. 22, No. 1, 2022, p. 45-55.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Prasad, A, Lorenzen, ED & Westbury, MV 2022, 'Evaluating the role of reference-genome phylogenetic distance on evolutionary inference', Molecular Ecology Resources, vol. 22, no. 1, pp. 45-55. https://doi.org/10.1111/1755-0998.13457

APA

Prasad, A., Lorenzen, E. D., & Westbury, M. V. (2022). Evaluating the role of reference-genome phylogenetic distance on evolutionary inference. Molecular Ecology Resources, 22(1), 45-55. https://doi.org/10.1111/1755-0998.13457

Vancouver

Prasad A, Lorenzen ED, Westbury MV. Evaluating the role of reference-genome phylogenetic distance on evolutionary inference. Molecular Ecology Resources. 2022;22(1):45-55. https://doi.org/10.1111/1755-0998.13457

Author

Prasad, Aparna ; Lorenzen, Eline D. ; Westbury, Michael V. / Evaluating the role of reference-genome phylogenetic distance on evolutionary inference. In: Molecular Ecology Resources. 2022 ; Vol. 22, No. 1. pp. 45-55.

Bibtex

@article{f83735046c524f32b906b3f5190813aa,
title = "Evaluating the role of reference-genome phylogenetic distance on evolutionary inference",
abstract = "When a high-quality genome assembly of a target species is unavailable, an option to avoid the costly de novo assembly process is a mapping-based assembly. However, mapping shotgun data to a distant relative may lead to biased or erroneous evolutionary inference. Here, we used short-read data from a mammal (beluga whale) and a bird species (rowi kiwi) to evaluate whether reference genome phylogenetic distance can impact downstream demographic (Pairwise Sequentially Markovian Coalescent) and genetic diversity (heterozygosity, runs of homozygosity) analyses. We mapped to assemblies of species of varying phylogenetic distance (from conspecific to genome-wide divergence of >7%), and de novo assemblies created using cross-species scaffolding. We show that while reference genome phylogenetic distance has an impact on demographic analyses, it is not pronounced until using a reference genome with >3% divergence from the target species. When mapping to cross-species scaffolded assemblies, we are unable to replicate the original beluga demographic results, but are able with the rowi kiwi, presumably reflecting the more fragmented nature of the beluga assemblies. We find that increased phylogenetic distance has a pronounced impact on genetic diversity estimates; heterozygosity estimates deviate incrementally with increasing phylogenetic distance. Moreover, runs of homozygosity are largely undetectable when mapping to any nonconspecific assembly. However, these biases can be reduced when mapping to a cross-species scaffolded assembly. Taken together, our results show that caution should be exercised when selecting reference genomes. Cross-species scaffolding may offer a way to avoid a costly, traditional de novo assembly, while still producing robust, evolutionary inference.",
keywords = "bioinfomatics, phyloinfomatics, genomics, proteomics, inbreeding, molecular evolution, population dynamics, HISTORY, ALIGNMENT",
author = "Aparna Prasad and Lorenzen, {Eline D.} and Westbury, {Michael V.}",
year = "2022",
doi = "10.1111/1755-0998.13457",
language = "English",
volume = "22",
pages = "45--55",
journal = "Molecular Ecology",
issn = "0962-1083",
publisher = "Wiley-Blackwell",
number = "1",

}

RIS

TY - JOUR

T1 - Evaluating the role of reference-genome phylogenetic distance on evolutionary inference

AU - Prasad, Aparna

AU - Lorenzen, Eline D.

AU - Westbury, Michael V.

PY - 2022

Y1 - 2022

N2 - When a high-quality genome assembly of a target species is unavailable, an option to avoid the costly de novo assembly process is a mapping-based assembly. However, mapping shotgun data to a distant relative may lead to biased or erroneous evolutionary inference. Here, we used short-read data from a mammal (beluga whale) and a bird species (rowi kiwi) to evaluate whether reference genome phylogenetic distance can impact downstream demographic (Pairwise Sequentially Markovian Coalescent) and genetic diversity (heterozygosity, runs of homozygosity) analyses. We mapped to assemblies of species of varying phylogenetic distance (from conspecific to genome-wide divergence of >7%), and de novo assemblies created using cross-species scaffolding. We show that while reference genome phylogenetic distance has an impact on demographic analyses, it is not pronounced until using a reference genome with >3% divergence from the target species. When mapping to cross-species scaffolded assemblies, we are unable to replicate the original beluga demographic results, but are able with the rowi kiwi, presumably reflecting the more fragmented nature of the beluga assemblies. We find that increased phylogenetic distance has a pronounced impact on genetic diversity estimates; heterozygosity estimates deviate incrementally with increasing phylogenetic distance. Moreover, runs of homozygosity are largely undetectable when mapping to any nonconspecific assembly. However, these biases can be reduced when mapping to a cross-species scaffolded assembly. Taken together, our results show that caution should be exercised when selecting reference genomes. Cross-species scaffolding may offer a way to avoid a costly, traditional de novo assembly, while still producing robust, evolutionary inference.

AB - When a high-quality genome assembly of a target species is unavailable, an option to avoid the costly de novo assembly process is a mapping-based assembly. However, mapping shotgun data to a distant relative may lead to biased or erroneous evolutionary inference. Here, we used short-read data from a mammal (beluga whale) and a bird species (rowi kiwi) to evaluate whether reference genome phylogenetic distance can impact downstream demographic (Pairwise Sequentially Markovian Coalescent) and genetic diversity (heterozygosity, runs of homozygosity) analyses. We mapped to assemblies of species of varying phylogenetic distance (from conspecific to genome-wide divergence of >7%), and de novo assemblies created using cross-species scaffolding. We show that while reference genome phylogenetic distance has an impact on demographic analyses, it is not pronounced until using a reference genome with >3% divergence from the target species. When mapping to cross-species scaffolded assemblies, we are unable to replicate the original beluga demographic results, but are able with the rowi kiwi, presumably reflecting the more fragmented nature of the beluga assemblies. We find that increased phylogenetic distance has a pronounced impact on genetic diversity estimates; heterozygosity estimates deviate incrementally with increasing phylogenetic distance. Moreover, runs of homozygosity are largely undetectable when mapping to any nonconspecific assembly. However, these biases can be reduced when mapping to a cross-species scaffolded assembly. Taken together, our results show that caution should be exercised when selecting reference genomes. Cross-species scaffolding may offer a way to avoid a costly, traditional de novo assembly, while still producing robust, evolutionary inference.

KW - bioinfomatics

KW - phyloinfomatics

KW - genomics

KW - proteomics

KW - inbreeding

KW - molecular evolution

KW - population dynamics

KW - HISTORY

KW - ALIGNMENT

U2 - 10.1111/1755-0998.13457

DO - 10.1111/1755-0998.13457

M3 - Journal article

C2 - 34176238

VL - 22

SP - 45

EP - 55

JO - Molecular Ecology

JF - Molecular Ecology

SN - 0962-1083

IS - 1

ER -

ID: 275994469