Biodiversity Soup II: A bulk-sample metabarcoding pipeline emphasizing error reduction

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Biodiversity Soup II : A bulk-sample metabarcoding pipeline emphasizing error reduction. / Yang, Chunyan; Bohmann, Kristine; Wang, Xiaoyang; Cai, Wang; Wales, Nathan; Ding, Zhaoli; Gopalakrishnan, Shyam; Yu, Douglas W.

In: Methods in Ecology and Evolution, Vol. 12, No. 7, 2021, p. 1252-1264.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Yang, C, Bohmann, K, Wang, X, Cai, W, Wales, N, Ding, Z, Gopalakrishnan, S & Yu, DW 2021, 'Biodiversity Soup II: A bulk-sample metabarcoding pipeline emphasizing error reduction', Methods in Ecology and Evolution, vol. 12, no. 7, pp. 1252-1264. https://doi.org/10.1111/2041-210X.13602

APA

Yang, C., Bohmann, K., Wang, X., Cai, W., Wales, N., Ding, Z., Gopalakrishnan, S., & Yu, D. W. (2021). Biodiversity Soup II: A bulk-sample metabarcoding pipeline emphasizing error reduction. Methods in Ecology and Evolution, 12(7), 1252-1264. https://doi.org/10.1111/2041-210X.13602

Vancouver

Yang C, Bohmann K, Wang X, Cai W, Wales N, Ding Z et al. Biodiversity Soup II: A bulk-sample metabarcoding pipeline emphasizing error reduction. Methods in Ecology and Evolution. 2021;12(7):1252-1264. https://doi.org/10.1111/2041-210X.13602

Author

Yang, Chunyan ; Bohmann, Kristine ; Wang, Xiaoyang ; Cai, Wang ; Wales, Nathan ; Ding, Zhaoli ; Gopalakrishnan, Shyam ; Yu, Douglas W. / Biodiversity Soup II : A bulk-sample metabarcoding pipeline emphasizing error reduction. In: Methods in Ecology and Evolution. 2021 ; Vol. 12, No. 7. pp. 1252-1264.

Bibtex

@article{7416c2826a3c418e87d9f0a9b3b7aa22,
title = "Biodiversity Soup II: A bulk-sample metabarcoding pipeline emphasizing error reduction",
abstract = "Despite widespread recognition of its great promise to aid decision-making in environmental management, the applied use of metabarcoding requires improvements to reduce the multiple errors that arise during PCR amplification, sequencing and library generation. We present a co-designed wet-lab and bioinformatic workflow for metabarcoding bulk samples that removes both false-positive (tag jumps, chimeras, erroneous sequences) and false-negative ({\textquoteleft}dropout{\textquoteright}) errors. However, we find that it is not possible to recover relative-abundance information from amplicon data, due to persistent species-specific biases. To present and validate our workflow, we created eight mock arthropod soups, all containing the same 248 arthropod morphospecies but differing in absolute and relative DNA concentrations, and we ran them under five different PCR conditions. Our pipeline includes qPCR-optimized PCR annealing temperature and cycle number, twin-tagging, multiple independent PCR replicates per sample, and negative and positive controls. In the bioinformatic portion, we introduce Begum, which is a new version of DAMe (Zepeda-Mendoza et al., 2016. BMC Res. Notes 9:255) that ignores heterogeneity spacers, allows primer mismatches when demultiplexing samples and is more efficient. Like DAMe, Begum removes tag-jumped reads and removes sequence errors by keeping only sequences that appear in more than one PCR above a minimum copy number per PCR. The filtering thresholds are user-configurable. We report that OTU dropout frequency and taxonomic amplification bias are both reduced by using a PCR annealing temperature and cycle number on the low ends of the ranges currently used for the Leray-FolDegenRev primers. We also report that tag jumps and erroneous sequences can be nearly eliminated with Begum filtering, at the cost of only a small rise in dropouts. We replicate published findings that uneven size distribution of input biomasses leads to greater dropout frequency and that OTU size is a poor predictor of species input biomass. Finally, we find no evidence for {\textquoteleft}tag-biased{\textquoteright} PCR amplification. To aid learning, reproducibility, and the design and testing of alternative metabarcoding pipelines, we provide our Illumina and input-species sequence datasets, scripts, a spreadsheet for designing primer tags and a tutorial.",
keywords = "bulk-sample DNA metabarcoding, environmental DNA, environmental impact assessment, false negatives, false positives, Illumina high-throughput sequencing, tag bias",
author = "Chunyan Yang and Kristine Bohmann and Xiaoyang Wang and Wang Cai and Nathan Wales and Zhaoli Ding and Shyam Gopalakrishnan and Yu, {Douglas W.}",
note = "Funding Information: The authors thank Mr. Zongxu Li in South China Barcoding Center for help with arthropod selection and morphological identification. C.Y., D.W.Y., X.W. and W.C. were supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA20050202), the National Natural Science Foundation of China (41661144002, 31670536, 31400470, 31500305), the Key Research Program of Frontier Sciences, CAS (QYZDY‐SSW‐SMC024), the Bureau of International Cooperation (GJHZ1754), the Ministry of Science and Technology of China (2012FY110800), the State Key Laboratory of Genetic Resources and Evolution (GREKF18‐04) at the Kunming Insitute of Zoology, the University of East Anglia and the University of Chinese Academy of Sciences. D.W.Y. was supported by a Leverhulme Trust Research Fellowship. K.B. was supported by the Danish Council for Independent Research (DFF‐5051‐00140). Publisher Copyright: {\textcopyright} 2021 British Ecological Society",
year = "2021",
doi = "10.1111/2041-210X.13602",
language = "English",
volume = "12",
pages = "1252--1264",
journal = "Methods in Ecology and Evolution",
issn = "2041-210X",
publisher = "Wiley-Blackwell",
number = "7",

}

RIS

TY - JOUR

T1 - Biodiversity Soup II

T2 - A bulk-sample metabarcoding pipeline emphasizing error reduction

AU - Yang, Chunyan

AU - Bohmann, Kristine

AU - Wang, Xiaoyang

AU - Cai, Wang

AU - Wales, Nathan

AU - Ding, Zhaoli

AU - Gopalakrishnan, Shyam

AU - Yu, Douglas W.

N1 - Funding Information: The authors thank Mr. Zongxu Li in South China Barcoding Center for help with arthropod selection and morphological identification. C.Y., D.W.Y., X.W. and W.C. were supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA20050202), the National Natural Science Foundation of China (41661144002, 31670536, 31400470, 31500305), the Key Research Program of Frontier Sciences, CAS (QYZDY‐SSW‐SMC024), the Bureau of International Cooperation (GJHZ1754), the Ministry of Science and Technology of China (2012FY110800), the State Key Laboratory of Genetic Resources and Evolution (GREKF18‐04) at the Kunming Insitute of Zoology, the University of East Anglia and the University of Chinese Academy of Sciences. D.W.Y. was supported by a Leverhulme Trust Research Fellowship. K.B. was supported by the Danish Council for Independent Research (DFF‐5051‐00140). Publisher Copyright: © 2021 British Ecological Society

PY - 2021

Y1 - 2021

N2 - Despite widespread recognition of its great promise to aid decision-making in environmental management, the applied use of metabarcoding requires improvements to reduce the multiple errors that arise during PCR amplification, sequencing and library generation. We present a co-designed wet-lab and bioinformatic workflow for metabarcoding bulk samples that removes both false-positive (tag jumps, chimeras, erroneous sequences) and false-negative (‘dropout’) errors. However, we find that it is not possible to recover relative-abundance information from amplicon data, due to persistent species-specific biases. To present and validate our workflow, we created eight mock arthropod soups, all containing the same 248 arthropod morphospecies but differing in absolute and relative DNA concentrations, and we ran them under five different PCR conditions. Our pipeline includes qPCR-optimized PCR annealing temperature and cycle number, twin-tagging, multiple independent PCR replicates per sample, and negative and positive controls. In the bioinformatic portion, we introduce Begum, which is a new version of DAMe (Zepeda-Mendoza et al., 2016. BMC Res. Notes 9:255) that ignores heterogeneity spacers, allows primer mismatches when demultiplexing samples and is more efficient. Like DAMe, Begum removes tag-jumped reads and removes sequence errors by keeping only sequences that appear in more than one PCR above a minimum copy number per PCR. The filtering thresholds are user-configurable. We report that OTU dropout frequency and taxonomic amplification bias are both reduced by using a PCR annealing temperature and cycle number on the low ends of the ranges currently used for the Leray-FolDegenRev primers. We also report that tag jumps and erroneous sequences can be nearly eliminated with Begum filtering, at the cost of only a small rise in dropouts. We replicate published findings that uneven size distribution of input biomasses leads to greater dropout frequency and that OTU size is a poor predictor of species input biomass. Finally, we find no evidence for ‘tag-biased’ PCR amplification. To aid learning, reproducibility, and the design and testing of alternative metabarcoding pipelines, we provide our Illumina and input-species sequence datasets, scripts, a spreadsheet for designing primer tags and a tutorial.

AB - Despite widespread recognition of its great promise to aid decision-making in environmental management, the applied use of metabarcoding requires improvements to reduce the multiple errors that arise during PCR amplification, sequencing and library generation. We present a co-designed wet-lab and bioinformatic workflow for metabarcoding bulk samples that removes both false-positive (tag jumps, chimeras, erroneous sequences) and false-negative (‘dropout’) errors. However, we find that it is not possible to recover relative-abundance information from amplicon data, due to persistent species-specific biases. To present and validate our workflow, we created eight mock arthropod soups, all containing the same 248 arthropod morphospecies but differing in absolute and relative DNA concentrations, and we ran them under five different PCR conditions. Our pipeline includes qPCR-optimized PCR annealing temperature and cycle number, twin-tagging, multiple independent PCR replicates per sample, and negative and positive controls. In the bioinformatic portion, we introduce Begum, which is a new version of DAMe (Zepeda-Mendoza et al., 2016. BMC Res. Notes 9:255) that ignores heterogeneity spacers, allows primer mismatches when demultiplexing samples and is more efficient. Like DAMe, Begum removes tag-jumped reads and removes sequence errors by keeping only sequences that appear in more than one PCR above a minimum copy number per PCR. The filtering thresholds are user-configurable. We report that OTU dropout frequency and taxonomic amplification bias are both reduced by using a PCR annealing temperature and cycle number on the low ends of the ranges currently used for the Leray-FolDegenRev primers. We also report that tag jumps and erroneous sequences can be nearly eliminated with Begum filtering, at the cost of only a small rise in dropouts. We replicate published findings that uneven size distribution of input biomasses leads to greater dropout frequency and that OTU size is a poor predictor of species input biomass. Finally, we find no evidence for ‘tag-biased’ PCR amplification. To aid learning, reproducibility, and the design and testing of alternative metabarcoding pipelines, we provide our Illumina and input-species sequence datasets, scripts, a spreadsheet for designing primer tags and a tutorial.

KW - bulk-sample DNA metabarcoding

KW - environmental DNA

KW - environmental impact assessment

KW - false negatives

KW - false positives

KW - Illumina high-throughput sequencing

KW - tag bias

U2 - 10.1111/2041-210X.13602

DO - 10.1111/2041-210X.13602

M3 - Journal article

AN - SCOPUS:85105217542

VL - 12

SP - 1252

EP - 1264

JO - Methods in Ecology and Evolution

JF - Methods in Ecology and Evolution

SN - 2041-210X

IS - 7

ER -

ID: 272651910