Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries. / An, Ulzee; Pazokitoroudi, Ali; Alvarez, Marcus; Huang, Lianyun; Bacanu, Silviu; Schork, Andrew J.; Kendler, Kenneth; Pajukanta, Päivi; Flint, Jonathan; Zaitlen, Noah; Cai, Na; Dahl, Andy; Sankararaman, Sriram.
In: Nature Genetics, Vol. 55, No. 12, 2023, p. 2269-2276.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries
AU - An, Ulzee
AU - Pazokitoroudi, Ali
AU - Alvarez, Marcus
AU - Huang, Lianyun
AU - Bacanu, Silviu
AU - Schork, Andrew J.
AU - Kendler, Kenneth
AU - Pajukanta, Päivi
AU - Flint, Jonathan
AU - Zaitlen, Noah
AU - Cai, Na
AU - Dahl, Andy
AU - Sankararaman, Sriram
N1 - Publisher Copyright: © 2023, The Author(s).
PY - 2023
Y1 - 2023
N2 - Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or ‘fill-in’ missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.
AB - Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or ‘fill-in’ missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.
U2 - 10.1038/s41588-023-01558-w
DO - 10.1038/s41588-023-01558-w
M3 - Journal article
C2 - 37985819
AN - SCOPUS:85177165720
VL - 55
SP - 2269
EP - 2276
JO - Nature Genetics
JF - Nature Genetics
SN - 1061-4036
IS - 12
ER -
ID: 379868226