Accurate and Effective Detection of Recurrent Copy Number Variants in Large SNP Genotype Datasets

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Accurate and Effective Detection of Recurrent Copy Number Variants in Large SNP Genotype Datasets. / Montalbano, Simone; Sánchez, Xabier Calle; Vaez, Morteza; Helenius, Dorte; Werge, Thomas; Ingason, Andrés.

In: Current Protocols, Vol. 2, No. 12, e621, 2022.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Montalbano, S, Sánchez, XC, Vaez, M, Helenius, D, Werge, T & Ingason, A 2022, 'Accurate and Effective Detection of Recurrent Copy Number Variants in Large SNP Genotype Datasets', Current Protocols, vol. 2, no. 12, e621. https://doi.org/10.1002/cpz1.621

APA

Montalbano, S., Sánchez, X. C., Vaez, M., Helenius, D., Werge, T., & Ingason, A. (2022). Accurate and Effective Detection of Recurrent Copy Number Variants in Large SNP Genotype Datasets. Current Protocols, 2(12), [e621]. https://doi.org/10.1002/cpz1.621

Vancouver

Montalbano S, Sánchez XC, Vaez M, Helenius D, Werge T, Ingason A. Accurate and Effective Detection of Recurrent Copy Number Variants in Large SNP Genotype Datasets. Current Protocols. 2022;2(12). e621. https://doi.org/10.1002/cpz1.621

Author

Montalbano, Simone ; Sánchez, Xabier Calle ; Vaez, Morteza ; Helenius, Dorte ; Werge, Thomas ; Ingason, Andrés. / Accurate and Effective Detection of Recurrent Copy Number Variants in Large SNP Genotype Datasets. In: Current Protocols. 2022 ; Vol. 2, No. 12.

Bibtex

@article{97fca4ec92004e72ab8c0b91d4d9c800,
title = "Accurate and Effective Detection of Recurrent Copy Number Variants in Large SNP Genotype Datasets",
abstract = "Structural variations, including recurrent Copy Number Variants (CNVs) at specific genomic loci, have been found to be associated with increased risk of several diseases and syndromes. CNV carrier status can be determined in large collections of samples using SNP arrays and, more recently, sequencing data. Although there is some consensus among researchers about the essential steps required in such analysis (i.e., CNV calling, filtering of putative carriers, and visual validation using intensity data plots of the genomic region), standard methodologies and processes to control the quality and consistency of the results are lacking. Here, we present a comprehensive and user-friendly protocol that we have refined from our extensive research experience in the field. We cover every aspect of the analysis, from input data curation to final results. For each step, we highlight which parameters affect the analysis the most and how different settings may lead to different results. We provide a pipeline to run the complete analysis with effective (but customizable) pre-sets. We present software that we developed to better handle and filter putative CNV carriers and perform visual inspection to validate selected candidates. Finally, we describe methods to evaluate the critical sections and actions to counterbalance potential problems. The current implementation is focused on Illumina SNP array data. All the presented software is freely available and provided in a ready-to-use docker container.",
keywords = "bioinformatics pipeline, CNVs, SNPs, structural variation",
author = "Simone Montalbano and S{\'a}nchez, {Xabier Calle} and Morteza Vaez and Dorte Helenius and Thomas Werge and Andr{\'e}s Ingason",
note = "Publisher Copyright: {\textcopyright} 2022 The Authors. Current Protocols published by Wiley Periodicals LLC.",
year = "2022",
doi = "10.1002/cpz1.621",
language = "English",
volume = "2",
journal = "Current Protocols",
issn = "2691-1299",
publisher = "Wiley",
number = "12",

}

RIS

TY - JOUR

T1 - Accurate and Effective Detection of Recurrent Copy Number Variants in Large SNP Genotype Datasets

AU - Montalbano, Simone

AU - Sánchez, Xabier Calle

AU - Vaez, Morteza

AU - Helenius, Dorte

AU - Werge, Thomas

AU - Ingason, Andrés

N1 - Publisher Copyright: © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC.

PY - 2022

Y1 - 2022

N2 - Structural variations, including recurrent Copy Number Variants (CNVs) at specific genomic loci, have been found to be associated with increased risk of several diseases and syndromes. CNV carrier status can be determined in large collections of samples using SNP arrays and, more recently, sequencing data. Although there is some consensus among researchers about the essential steps required in such analysis (i.e., CNV calling, filtering of putative carriers, and visual validation using intensity data plots of the genomic region), standard methodologies and processes to control the quality and consistency of the results are lacking. Here, we present a comprehensive and user-friendly protocol that we have refined from our extensive research experience in the field. We cover every aspect of the analysis, from input data curation to final results. For each step, we highlight which parameters affect the analysis the most and how different settings may lead to different results. We provide a pipeline to run the complete analysis with effective (but customizable) pre-sets. We present software that we developed to better handle and filter putative CNV carriers and perform visual inspection to validate selected candidates. Finally, we describe methods to evaluate the critical sections and actions to counterbalance potential problems. The current implementation is focused on Illumina SNP array data. All the presented software is freely available and provided in a ready-to-use docker container.

AB - Structural variations, including recurrent Copy Number Variants (CNVs) at specific genomic loci, have been found to be associated with increased risk of several diseases and syndromes. CNV carrier status can be determined in large collections of samples using SNP arrays and, more recently, sequencing data. Although there is some consensus among researchers about the essential steps required in such analysis (i.e., CNV calling, filtering of putative carriers, and visual validation using intensity data plots of the genomic region), standard methodologies and processes to control the quality and consistency of the results are lacking. Here, we present a comprehensive and user-friendly protocol that we have refined from our extensive research experience in the field. We cover every aspect of the analysis, from input data curation to final results. For each step, we highlight which parameters affect the analysis the most and how different settings may lead to different results. We provide a pipeline to run the complete analysis with effective (but customizable) pre-sets. We present software that we developed to better handle and filter putative CNV carriers and perform visual inspection to validate selected candidates. Finally, we describe methods to evaluate the critical sections and actions to counterbalance potential problems. The current implementation is focused on Illumina SNP array data. All the presented software is freely available and provided in a ready-to-use docker container.

KW - bioinformatics pipeline

KW - CNVs

KW - SNPs

KW - structural variation

U2 - 10.1002/cpz1.621

DO - 10.1002/cpz1.621

M3 - Journal article

C2 - 36469582

AN - SCOPUS:85143552042

VL - 2

JO - Current Protocols

JF - Current Protocols

SN - 2691-1299

IS - 12

M1 - e621

ER -

ID: 340552503