Evaluating the role of reference-genome phylogenetic distance on evolutionary inference
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Evaluating the role of reference-genome phylogenetic distance on evolutionary inference. / Prasad, Aparna; Lorenzen, Eline D.; Westbury, Michael V.
In: Molecular Ecology Resources, Vol. 22, No. 1, 2022, p. 45-55.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Evaluating the role of reference-genome phylogenetic distance on evolutionary inference
AU - Prasad, Aparna
AU - Lorenzen, Eline D.
AU - Westbury, Michael V.
PY - 2022
Y1 - 2022
N2 - When a high-quality genome assembly of a target species is unavailable, an option to avoid the costly de novo assembly process is a mapping-based assembly. However, mapping shotgun data to a distant relative may lead to biased or erroneous evolutionary inference. Here, we used short-read data from a mammal (beluga whale) and a bird species (rowi kiwi) to evaluate whether reference genome phylogenetic distance can impact downstream demographic (Pairwise Sequentially Markovian Coalescent) and genetic diversity (heterozygosity, runs of homozygosity) analyses. We mapped to assemblies of species of varying phylogenetic distance (from conspecific to genome-wide divergence of >7%), and de novo assemblies created using cross-species scaffolding. We show that while reference genome phylogenetic distance has an impact on demographic analyses, it is not pronounced until using a reference genome with >3% divergence from the target species. When mapping to cross-species scaffolded assemblies, we are unable to replicate the original beluga demographic results, but are able with the rowi kiwi, presumably reflecting the more fragmented nature of the beluga assemblies. We find that increased phylogenetic distance has a pronounced impact on genetic diversity estimates; heterozygosity estimates deviate incrementally with increasing phylogenetic distance. Moreover, runs of homozygosity are largely undetectable when mapping to any nonconspecific assembly. However, these biases can be reduced when mapping to a cross-species scaffolded assembly. Taken together, our results show that caution should be exercised when selecting reference genomes. Cross-species scaffolding may offer a way to avoid a costly, traditional de novo assembly, while still producing robust, evolutionary inference.
AB - When a high-quality genome assembly of a target species is unavailable, an option to avoid the costly de novo assembly process is a mapping-based assembly. However, mapping shotgun data to a distant relative may lead to biased or erroneous evolutionary inference. Here, we used short-read data from a mammal (beluga whale) and a bird species (rowi kiwi) to evaluate whether reference genome phylogenetic distance can impact downstream demographic (Pairwise Sequentially Markovian Coalescent) and genetic diversity (heterozygosity, runs of homozygosity) analyses. We mapped to assemblies of species of varying phylogenetic distance (from conspecific to genome-wide divergence of >7%), and de novo assemblies created using cross-species scaffolding. We show that while reference genome phylogenetic distance has an impact on demographic analyses, it is not pronounced until using a reference genome with >3% divergence from the target species. When mapping to cross-species scaffolded assemblies, we are unable to replicate the original beluga demographic results, but are able with the rowi kiwi, presumably reflecting the more fragmented nature of the beluga assemblies. We find that increased phylogenetic distance has a pronounced impact on genetic diversity estimates; heterozygosity estimates deviate incrementally with increasing phylogenetic distance. Moreover, runs of homozygosity are largely undetectable when mapping to any nonconspecific assembly. However, these biases can be reduced when mapping to a cross-species scaffolded assembly. Taken together, our results show that caution should be exercised when selecting reference genomes. Cross-species scaffolding may offer a way to avoid a costly, traditional de novo assembly, while still producing robust, evolutionary inference.
KW - bioinfomatics
KW - phyloinfomatics
KW - genomics
KW - proteomics
KW - inbreeding
KW - molecular evolution
KW - population dynamics
KW - HISTORY
KW - ALIGNMENT
U2 - 10.1111/1755-0998.13457
DO - 10.1111/1755-0998.13457
M3 - Journal article
C2 - 34176238
VL - 22
SP - 45
EP - 55
JO - Molecular Ecology
JF - Molecular Ecology
SN - 0962-1083
IS - 1
ER -
ID: 275994469