occCite: Tools for querying and managing large biodiversity occurrence datasets
Research output: Contribution to journal › Journal article › peer-review
Standard
occCite : Tools for querying and managing large biodiversity occurrence datasets. / Owens, Hannah L.; Merow, Cory; Maitner, Brian S.; Kass, Jamie M.; Barve, Vijay; Guralnick, Robert P.
In: Ecography, Vol. 44, No. 8, 2021, p. 1228-1235.Research output: Contribution to journal › Journal article › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - occCite
T2 - Tools for querying and managing large biodiversity occurrence datasets
AU - Owens, Hannah L.
AU - Merow, Cory
AU - Maitner, Brian S.
AU - Kass, Jamie M.
AU - Barve, Vijay
AU - Guralnick, Robert P.
N1 - Funding Information: – Funding for this project was provided by a seed grant from the University of Florida Biodiversity and Informatics Institutes and a second place Ebbe Nielsen Challenge prize from the Global Biodiversity Information Facility. CM acknowledges funding from NSF grant DBI‐1913673 and DBI‐1661510. Funding Publisher Copyright: © 2021 The Authors. Ecography published by John Wiley & Sons Ltd on behalf of Nordic Society Oikos
PY - 2021
Y1 - 2021
N2 - The amount of observational and specimen-based biodiversity data available to researchers is increasing exponentially, yet the ability to manage and cite large, complex biodiversity datasets lags behind. This management and citation gap impedes reproducibility for data users and the ability for data publishers to track use and accumulate use citations, ultimately harming the longer-term sustainability of the still-emerging enterprise of research data-sharing. Here we present an R package, occCite (v. 0.4.7), to aid researchers in querying large species occurrence data aggregators (specifically, the Global Biodiversity Information Facility, GBIF, and the Botanical Information and Ecology Network, BIEN), and store metadata such as primary data providers, database accession dates, DOIs, and the taxonomic source used for search terms. occCite also includes tools to summarize and visualize query results and generate citation lists of all data providers and software packages used during the query process. We provide examples of a basic occurrence search and citation workflow as well as an advanced workflow using features for custom optimized searches, visualization, and summary procedures. occCite improves upon existing R packages by uniting data from powerful API-based query packages (rgbif and BIEN) into a unified object-based framework, while maintaining metadata vital to best-practice recommendations for documenting biodiversity analysis workflows. occCite aims to efficiently close the gap in the citation cycle between primary data providers and final research products, allowing researchers to meet dataset documentation standards without sacrificing time and resources to the demands of providing increasing levels of detail on their datasets.
AB - The amount of observational and specimen-based biodiversity data available to researchers is increasing exponentially, yet the ability to manage and cite large, complex biodiversity datasets lags behind. This management and citation gap impedes reproducibility for data users and the ability for data publishers to track use and accumulate use citations, ultimately harming the longer-term sustainability of the still-emerging enterprise of research data-sharing. Here we present an R package, occCite (v. 0.4.7), to aid researchers in querying large species occurrence data aggregators (specifically, the Global Biodiversity Information Facility, GBIF, and the Botanical Information and Ecology Network, BIEN), and store metadata such as primary data providers, database accession dates, DOIs, and the taxonomic source used for search terms. occCite also includes tools to summarize and visualize query results and generate citation lists of all data providers and software packages used during the query process. We provide examples of a basic occurrence search and citation workflow as well as an advanced workflow using features for custom optimized searches, visualization, and summary procedures. occCite improves upon existing R packages by uniting data from powerful API-based query packages (rgbif and BIEN) into a unified object-based framework, while maintaining metadata vital to best-practice recommendations for documenting biodiversity analysis workflows. occCite aims to efficiently close the gap in the citation cycle between primary data providers and final research products, allowing researchers to meet dataset documentation standards without sacrificing time and resources to the demands of providing increasing levels of detail on their datasets.
KW - citations
KW - database aggregation
KW - metadata
KW - presence-only data
KW - R package
U2 - 10.1111/ecog.05618
DO - 10.1111/ecog.05618
M3 - Journal article
AN - SCOPUS:85108256703
VL - 44
SP - 1228
EP - 1235
JO - Ecography
JF - Ecography
SN - 0906-7590
IS - 8
ER -
ID: 273365467