A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data. / Gabrielaite, Migle; Torp, Mathias Husted; Rasmussen, Malthe Sebro; Andreu-Sánchez, Sergio; Vieira, Filipe Garrett; Pedersen, Christina Bligaard; Kinalis, Savvas; Madsen, Majbritt Busk; Kodama, Miyako; Demircan, Gül Sude; Simonyan, Arman; Yde, Christina Westmose; Olsen, Lars Rønn; Marvig, Rasmus L.; Østrup, Olga; Rossing, Maria; Nielsen, Finn Cilius; Winther, Ole; Bagger, Frederik Otzen.

In: Cancers, Vol. 13, No. 24, 6283, 2021.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Gabrielaite, M, Torp, MH, Rasmussen, MS, Andreu-Sánchez, S, Vieira, FG, Pedersen, CB, Kinalis, S, Madsen, MB, Kodama, M, Demircan, GS, Simonyan, A, Yde, CW, Olsen, LR, Marvig, RL, Østrup, O, Rossing, M, Nielsen, FC, Winther, O & Bagger, FO 2021, 'A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data', Cancers, vol. 13, no. 24, 6283. https://doi.org/10.3390/cancers13246283

APA

Gabrielaite, M., Torp, M. H., Rasmussen, M. S., Andreu-Sánchez, S., Vieira, F. G., Pedersen, C. B., Kinalis, S., Madsen, M. B., Kodama, M., Demircan, G. S., Simonyan, A., Yde, C. W., Olsen, L. R., Marvig, R. L., Østrup, O., Rossing, M., Nielsen, F. C., Winther, O., & Bagger, F. O. (2021). A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data. Cancers, 13(24), [6283]. https://doi.org/10.3390/cancers13246283

Vancouver

Gabrielaite M, Torp MH, Rasmussen MS, Andreu-Sánchez S, Vieira FG, Pedersen CB et al. A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data. Cancers. 2021;13(24). 6283. https://doi.org/10.3390/cancers13246283

Author

Gabrielaite, Migle ; Torp, Mathias Husted ; Rasmussen, Malthe Sebro ; Andreu-Sánchez, Sergio ; Vieira, Filipe Garrett ; Pedersen, Christina Bligaard ; Kinalis, Savvas ; Madsen, Majbritt Busk ; Kodama, Miyako ; Demircan, Gül Sude ; Simonyan, Arman ; Yde, Christina Westmose ; Olsen, Lars Rønn ; Marvig, Rasmus L. ; Østrup, Olga ; Rossing, Maria ; Nielsen, Finn Cilius ; Winther, Ole ; Bagger, Frederik Otzen. / A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data. In: Cancers. 2021 ; Vol. 13, No. 24.

Bibtex

@article{6e366576eb7044de872aea21d1f19bc2,
title = "A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data",
abstract = "Copy-number variations (CNVs) have important clinical implications for several diseases and cancers. Relevant CNVs are hard to detect because common structural variations define large parts of the human genome. CNV calling from short-read sequencing would allow single protocol full genomic profiling. We reviewed 50 popular CNV calling tools and included 11 tools for benchmarking in a reference cohort encompassing 39 whole genome sequencing (WGS) samples paired current clinical standard—SNP-array based CNV calling. Additionally, for nine samples we also performed whole exome sequencing (WES), to address the effect of sequencing protocol on CNV calling. Furthermore, we included Gold Standard reference sample NA12878, and tested 12 samples with CNVs confirmed by multiplex ligation-dependent probe amplification (MLPA). Tool performance varied greatly in the number of called CNVs and bias for CNV lengths. Some tools had near-perfect recall of CNVs from arrays for some samples, but poor precision. Several tools had better performance for NA12878, which could be a result of overfitting. We suggest combining the best tools also based on different methodologies: GATK gCNV, Lumpy, DELLY, and cn.MOPS. Reducing the total number of called variants could potentially be assisted by the use of background panels for filtering of frequently called variants.",
keywords = "Benchmark, Bioinformatics, Copy-number variation (CNV), Structural variant, Whole exome sequencing (WES), Whole genome sequencing (WGS)",
author = "Migle Gabrielaite and Torp, {Mathias Husted} and Rasmussen, {Malthe Sebro} and Sergio Andreu-S{\'a}nchez and Vieira, {Filipe Garrett} and Pedersen, {Christina Bligaard} and Savvas Kinalis and Madsen, {Majbritt Busk} and Miyako Kodama and Demircan, {G{\"u}l Sude} and Arman Simonyan and Yde, {Christina Westmose} and Olsen, {Lars R{\o}nn} and Marvig, {Rasmus L.} and Olga {\O}strup and Maria Rossing and Nielsen, {Finn Cilius} and Ole Winther and Bagger, {Frederik Otzen}",
note = "Publisher Copyright: {\textcopyright} 2021 by the authors. Licensee MDPI, Basel, Switzerland.",
year = "2021",
doi = "10.3390/cancers13246283",
language = "English",
volume = "13",
journal = "Cancers",
issn = "2072-6694",
publisher = "M D P I AG",
number = "24",

}

RIS

TY - JOUR

T1 - A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data

AU - Gabrielaite, Migle

AU - Torp, Mathias Husted

AU - Rasmussen, Malthe Sebro

AU - Andreu-Sánchez, Sergio

AU - Vieira, Filipe Garrett

AU - Pedersen, Christina Bligaard

AU - Kinalis, Savvas

AU - Madsen, Majbritt Busk

AU - Kodama, Miyako

AU - Demircan, Gül Sude

AU - Simonyan, Arman

AU - Yde, Christina Westmose

AU - Olsen, Lars Rønn

AU - Marvig, Rasmus L.

AU - Østrup, Olga

AU - Rossing, Maria

AU - Nielsen, Finn Cilius

AU - Winther, Ole

AU - Bagger, Frederik Otzen

N1 - Publisher Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland.

PY - 2021

Y1 - 2021

N2 - Copy-number variations (CNVs) have important clinical implications for several diseases and cancers. Relevant CNVs are hard to detect because common structural variations define large parts of the human genome. CNV calling from short-read sequencing would allow single protocol full genomic profiling. We reviewed 50 popular CNV calling tools and included 11 tools for benchmarking in a reference cohort encompassing 39 whole genome sequencing (WGS) samples paired current clinical standard—SNP-array based CNV calling. Additionally, for nine samples we also performed whole exome sequencing (WES), to address the effect of sequencing protocol on CNV calling. Furthermore, we included Gold Standard reference sample NA12878, and tested 12 samples with CNVs confirmed by multiplex ligation-dependent probe amplification (MLPA). Tool performance varied greatly in the number of called CNVs and bias for CNV lengths. Some tools had near-perfect recall of CNVs from arrays for some samples, but poor precision. Several tools had better performance for NA12878, which could be a result of overfitting. We suggest combining the best tools also based on different methodologies: GATK gCNV, Lumpy, DELLY, and cn.MOPS. Reducing the total number of called variants could potentially be assisted by the use of background panels for filtering of frequently called variants.

AB - Copy-number variations (CNVs) have important clinical implications for several diseases and cancers. Relevant CNVs are hard to detect because common structural variations define large parts of the human genome. CNV calling from short-read sequencing would allow single protocol full genomic profiling. We reviewed 50 popular CNV calling tools and included 11 tools for benchmarking in a reference cohort encompassing 39 whole genome sequencing (WGS) samples paired current clinical standard—SNP-array based CNV calling. Additionally, for nine samples we also performed whole exome sequencing (WES), to address the effect of sequencing protocol on CNV calling. Furthermore, we included Gold Standard reference sample NA12878, and tested 12 samples with CNVs confirmed by multiplex ligation-dependent probe amplification (MLPA). Tool performance varied greatly in the number of called CNVs and bias for CNV lengths. Some tools had near-perfect recall of CNVs from arrays for some samples, but poor precision. Several tools had better performance for NA12878, which could be a result of overfitting. We suggest combining the best tools also based on different methodologies: GATK gCNV, Lumpy, DELLY, and cn.MOPS. Reducing the total number of called variants could potentially be assisted by the use of background panels for filtering of frequently called variants.

KW - Benchmark

KW - Bioinformatics

KW - Copy-number variation (CNV)

KW - Structural variant

KW - Whole exome sequencing (WES)

KW - Whole genome sequencing (WGS)

U2 - 10.3390/cancers13246283

DO - 10.3390/cancers13246283

M3 - Journal article

C2 - 34944901

AN - SCOPUS:85121460687

VL - 13

JO - Cancers

JF - Cancers

SN - 2072-6694

IS - 24

M1 - 6283

ER -

ID: 288123029