Updating splits, lumps, and shuffles: Reconciling GenBank names with standardized avian taxonomies

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Updating splits, lumps, and shuffles : Reconciling GenBank names with standardized avian taxonomies. / Hosner, Peter A.; Zhao, Min; Kimball, Rebecca T.; Braun, Edward L.; Burleigh, J. Gordon.

In: Ornithology, Vol. 139, ukac045, 2022.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Hosner, PA, Zhao, M, Kimball, RT, Braun, EL & Burleigh, JG 2022, 'Updating splits, lumps, and shuffles: Reconciling GenBank names with standardized avian taxonomies', Ornithology, vol. 139, ukac045. https://doi.org/10.1093/ornithology/ukac045

APA

Hosner, P. A., Zhao, M., Kimball, R. T., Braun, E. L., & Burleigh, J. G. (2022). Updating splits, lumps, and shuffles: Reconciling GenBank names with standardized avian taxonomies. Ornithology, 139, [ukac045]. https://doi.org/10.1093/ornithology/ukac045

Vancouver

Hosner PA, Zhao M, Kimball RT, Braun EL, Burleigh JG. Updating splits, lumps, and shuffles: Reconciling GenBank names with standardized avian taxonomies. Ornithology. 2022;139. ukac045. https://doi.org/10.1093/ornithology/ukac045

Author

Hosner, Peter A. ; Zhao, Min ; Kimball, Rebecca T. ; Braun, Edward L. ; Burleigh, J. Gordon. / Updating splits, lumps, and shuffles : Reconciling GenBank names with standardized avian taxonomies. In: Ornithology. 2022 ; Vol. 139.

Bibtex

@article{cd9b1facca474a3998813c4f1eceaca0,
title = "Updating splits, lumps, and shuffles: Reconciling GenBank names with standardized avian taxonomies",
abstract = "Biodiversity research has advanced by testing expectations of ecological and evolutionary hypotheses through the linking of large-scale genetic, distributional, and trait datasets. The rise of molecular systematics over the past 30 years has resulted in a wealth of DNA sequences from around the globe. Yet, advances in molecular systematics also have created taxonomic instability, as new estimates of evolutionary relationships and interpretations of species limits have required widespread scientific name changes. Taxonomic instability, colloquially {"}splits, lumps, and shuffles,{"} presents logistical challenges to large-scale biodiversity research because (1) the same species or sets of populations may be listed under different names in different data sources, or (2) the same name may apply to different sets of populations representing different taxonomic concepts. Consequently, distributional and trait data are often difficult to link directly to primary DNA sequence data without extensive and time-consuming curation. Here, we present RANT: Reconciliation of Avian NCBI Taxonomy. RANT applies taxonomic reconciliation to standardize avian taxon names in use in NCBI GenBank, a primary source of genetic data, to a widely used and regularly updated avian taxonomy: eBird/Clements. Of 14,341 avian species/subspecies names in GenBank, 11,031 directly matched an eBird/Clements; these link to more than 6 million nucleotide sequences. For the remaining unmatched avian names in GenBank, we used Avibase's system of taxonomic concepts, taxonomic descriptions in Cornell's Birds of the World, and DNA sequence metadata to identify corresponding eBird/Clements names. Reconciled names linked to more than 600,000 nucleotide sequences, similar to 9% of all avian sequences on GenBank. Nearly 10% of eBird/Clements names had nucleotide sequences listed under 2 or more GenBank names. Our taxonomic reconciliation is a first step towards rigorous and open-source curation of avian GenBank sequences and is available at GitHub, where it can be updated to correspond to future annual eBird/Clements taxonomic updates.",
keywords = "big data, DNA sequence data, genomics, NCBI, nomenclature, MOLECULAR SYSTEMATICS, WARBLERS PASSERIFORMES, REVEALS, AVES, TREE, BIRDS, TIMALIIDAE, TYRANNIDAE, PHYLOGENY, DIVERSITY",
author = "Hosner, {Peter A.} and Min Zhao and Kimball, {Rebecca T.} and Braun, {Edward L.} and Burleigh, {J. Gordon}",
year = "2022",
doi = "10.1093/ornithology/ukac045",
language = "English",
volume = "139",
journal = "Ornithology",
issn = "0004-8038",
publisher = "American Ornithological Society",

}

RIS

TY - JOUR

T1 - Updating splits, lumps, and shuffles

T2 - Reconciling GenBank names with standardized avian taxonomies

AU - Hosner, Peter A.

AU - Zhao, Min

AU - Kimball, Rebecca T.

AU - Braun, Edward L.

AU - Burleigh, J. Gordon

PY - 2022

Y1 - 2022

N2 - Biodiversity research has advanced by testing expectations of ecological and evolutionary hypotheses through the linking of large-scale genetic, distributional, and trait datasets. The rise of molecular systematics over the past 30 years has resulted in a wealth of DNA sequences from around the globe. Yet, advances in molecular systematics also have created taxonomic instability, as new estimates of evolutionary relationships and interpretations of species limits have required widespread scientific name changes. Taxonomic instability, colloquially "splits, lumps, and shuffles," presents logistical challenges to large-scale biodiversity research because (1) the same species or sets of populations may be listed under different names in different data sources, or (2) the same name may apply to different sets of populations representing different taxonomic concepts. Consequently, distributional and trait data are often difficult to link directly to primary DNA sequence data without extensive and time-consuming curation. Here, we present RANT: Reconciliation of Avian NCBI Taxonomy. RANT applies taxonomic reconciliation to standardize avian taxon names in use in NCBI GenBank, a primary source of genetic data, to a widely used and regularly updated avian taxonomy: eBird/Clements. Of 14,341 avian species/subspecies names in GenBank, 11,031 directly matched an eBird/Clements; these link to more than 6 million nucleotide sequences. For the remaining unmatched avian names in GenBank, we used Avibase's system of taxonomic concepts, taxonomic descriptions in Cornell's Birds of the World, and DNA sequence metadata to identify corresponding eBird/Clements names. Reconciled names linked to more than 600,000 nucleotide sequences, similar to 9% of all avian sequences on GenBank. Nearly 10% of eBird/Clements names had nucleotide sequences listed under 2 or more GenBank names. Our taxonomic reconciliation is a first step towards rigorous and open-source curation of avian GenBank sequences and is available at GitHub, where it can be updated to correspond to future annual eBird/Clements taxonomic updates.

AB - Biodiversity research has advanced by testing expectations of ecological and evolutionary hypotheses through the linking of large-scale genetic, distributional, and trait datasets. The rise of molecular systematics over the past 30 years has resulted in a wealth of DNA sequences from around the globe. Yet, advances in molecular systematics also have created taxonomic instability, as new estimates of evolutionary relationships and interpretations of species limits have required widespread scientific name changes. Taxonomic instability, colloquially "splits, lumps, and shuffles," presents logistical challenges to large-scale biodiversity research because (1) the same species or sets of populations may be listed under different names in different data sources, or (2) the same name may apply to different sets of populations representing different taxonomic concepts. Consequently, distributional and trait data are often difficult to link directly to primary DNA sequence data without extensive and time-consuming curation. Here, we present RANT: Reconciliation of Avian NCBI Taxonomy. RANT applies taxonomic reconciliation to standardize avian taxon names in use in NCBI GenBank, a primary source of genetic data, to a widely used and regularly updated avian taxonomy: eBird/Clements. Of 14,341 avian species/subspecies names in GenBank, 11,031 directly matched an eBird/Clements; these link to more than 6 million nucleotide sequences. For the remaining unmatched avian names in GenBank, we used Avibase's system of taxonomic concepts, taxonomic descriptions in Cornell's Birds of the World, and DNA sequence metadata to identify corresponding eBird/Clements names. Reconciled names linked to more than 600,000 nucleotide sequences, similar to 9% of all avian sequences on GenBank. Nearly 10% of eBird/Clements names had nucleotide sequences listed under 2 or more GenBank names. Our taxonomic reconciliation is a first step towards rigorous and open-source curation of avian GenBank sequences and is available at GitHub, where it can be updated to correspond to future annual eBird/Clements taxonomic updates.

KW - big data

KW - DNA sequence data

KW - genomics

KW - NCBI

KW - nomenclature

KW - MOLECULAR SYSTEMATICS

KW - WARBLERS PASSERIFORMES

KW - REVEALS

KW - AVES

KW - TREE

KW - BIRDS

KW - TIMALIIDAE

KW - TYRANNIDAE

KW - PHYLOGENY

KW - DIVERSITY

U2 - 10.1093/ornithology/ukac045

DO - 10.1093/ornithology/ukac045

M3 - Journal article

VL - 139

JO - Ornithology

JF - Ornithology

SN - 0004-8038

M1 - ukac045

ER -

ID: 320749912