Updating splits, lumps, and shuffles: Reconciling GenBank names with standardized avian taxonomies
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Updating splits, lumps, and shuffles : Reconciling GenBank names with standardized avian taxonomies. / Hosner, Peter A.; Zhao, Min; Kimball, Rebecca T.; Braun, Edward L.; Burleigh, J. Gordon.
In: Ornithology, Vol. 139, ukac045, 2022.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Updating splits, lumps, and shuffles
T2 - Reconciling GenBank names with standardized avian taxonomies
AU - Hosner, Peter A.
AU - Zhao, Min
AU - Kimball, Rebecca T.
AU - Braun, Edward L.
AU - Burleigh, J. Gordon
PY - 2022
Y1 - 2022
N2 - Biodiversity research has advanced by testing expectations of ecological and evolutionary hypotheses through the linking of large-scale genetic, distributional, and trait datasets. The rise of molecular systematics over the past 30 years has resulted in a wealth of DNA sequences from around the globe. Yet, advances in molecular systematics also have created taxonomic instability, as new estimates of evolutionary relationships and interpretations of species limits have required widespread scientific name changes. Taxonomic instability, colloquially "splits, lumps, and shuffles," presents logistical challenges to large-scale biodiversity research because (1) the same species or sets of populations may be listed under different names in different data sources, or (2) the same name may apply to different sets of populations representing different taxonomic concepts. Consequently, distributional and trait data are often difficult to link directly to primary DNA sequence data without extensive and time-consuming curation. Here, we present RANT: Reconciliation of Avian NCBI Taxonomy. RANT applies taxonomic reconciliation to standardize avian taxon names in use in NCBI GenBank, a primary source of genetic data, to a widely used and regularly updated avian taxonomy: eBird/Clements. Of 14,341 avian species/subspecies names in GenBank, 11,031 directly matched an eBird/Clements; these link to more than 6 million nucleotide sequences. For the remaining unmatched avian names in GenBank, we used Avibase's system of taxonomic concepts, taxonomic descriptions in Cornell's Birds of the World, and DNA sequence metadata to identify corresponding eBird/Clements names. Reconciled names linked to more than 600,000 nucleotide sequences, similar to 9% of all avian sequences on GenBank. Nearly 10% of eBird/Clements names had nucleotide sequences listed under 2 or more GenBank names. Our taxonomic reconciliation is a first step towards rigorous and open-source curation of avian GenBank sequences and is available at GitHub, where it can be updated to correspond to future annual eBird/Clements taxonomic updates.
AB - Biodiversity research has advanced by testing expectations of ecological and evolutionary hypotheses through the linking of large-scale genetic, distributional, and trait datasets. The rise of molecular systematics over the past 30 years has resulted in a wealth of DNA sequences from around the globe. Yet, advances in molecular systematics also have created taxonomic instability, as new estimates of evolutionary relationships and interpretations of species limits have required widespread scientific name changes. Taxonomic instability, colloquially "splits, lumps, and shuffles," presents logistical challenges to large-scale biodiversity research because (1) the same species or sets of populations may be listed under different names in different data sources, or (2) the same name may apply to different sets of populations representing different taxonomic concepts. Consequently, distributional and trait data are often difficult to link directly to primary DNA sequence data without extensive and time-consuming curation. Here, we present RANT: Reconciliation of Avian NCBI Taxonomy. RANT applies taxonomic reconciliation to standardize avian taxon names in use in NCBI GenBank, a primary source of genetic data, to a widely used and regularly updated avian taxonomy: eBird/Clements. Of 14,341 avian species/subspecies names in GenBank, 11,031 directly matched an eBird/Clements; these link to more than 6 million nucleotide sequences. For the remaining unmatched avian names in GenBank, we used Avibase's system of taxonomic concepts, taxonomic descriptions in Cornell's Birds of the World, and DNA sequence metadata to identify corresponding eBird/Clements names. Reconciled names linked to more than 600,000 nucleotide sequences, similar to 9% of all avian sequences on GenBank. Nearly 10% of eBird/Clements names had nucleotide sequences listed under 2 or more GenBank names. Our taxonomic reconciliation is a first step towards rigorous and open-source curation of avian GenBank sequences and is available at GitHub, where it can be updated to correspond to future annual eBird/Clements taxonomic updates.
KW - big data
KW - DNA sequence data
KW - genomics
KW - NCBI
KW - nomenclature
KW - MOLECULAR SYSTEMATICS
KW - WARBLERS PASSERIFORMES
KW - REVEALS
KW - AVES
KW - TREE
KW - BIRDS
KW - TIMALIIDAE
KW - TYRANNIDAE
KW - PHYLOGENY
KW - DIVERSITY
U2 - 10.1093/ornithology/ukac045
DO - 10.1093/ornithology/ukac045
M3 - Journal article
VL - 139
JO - Ornithology
JF - Ornithology
SN - 0004-8038
M1 - ukac045
ER -
ID: 320749912