Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries

Research output: Contribution to journal › Journal article › Research › peer-review

Standard

Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries. / An, Ulzee; Pazokitoroudi, Ali; Alvarez, Marcus; Huang, Lianyun; Bacanu, Silviu; Schork, Andrew J.; Kendler, Kenneth; Pajukanta, Päivi; Flint, Jonathan; Zaitlen, Noah; Cai, Na; Dahl, Andy; Sankararaman, Sriram.

In: Nature Genetics, Vol. 55, No. 12, 2023, p. 2269-2276.

Research output: Contribution to journal › Journal article › Research › peer-review

Harvard

An, U, Pazokitoroudi, A, Alvarez, M, Huang, L, Bacanu, S, Schork, AJ, Kendler, K, Pajukanta, P, Flint, J, Zaitlen, N, Cai, N, Dahl, A & Sankararaman, S 2023, 'Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries', Nature Genetics, vol. 55, no. 12, pp. 2269-2276. https://doi.org/10.1038/s41588-023-01558-w

APA

An, U., Pazokitoroudi, A., Alvarez, M., Huang, L., Bacanu, S., Schork, A. J., Kendler, K., Pajukanta, P., Flint, J., Zaitlen, N., Cai, N., Dahl, A., & Sankararaman, S. (2023). Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries. Nature Genetics, 55(12), 2269-2276. https://doi.org/10.1038/s41588-023-01558-w

Vancouver

An U, Pazokitoroudi A, Alvarez M, Huang L, Bacanu S, Schork AJ et al. Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries. Nature Genetics. 2023;55(12):2269-2276. https://doi.org/10.1038/s41588-023-01558-w

Author

An, Ulzee ; Pazokitoroudi, Ali ; Alvarez, Marcus ; Huang, Lianyun ; Bacanu, Silviu ; Schork, Andrew J. ; Kendler, Kenneth ; Pajukanta, Päivi ; Flint, Jonathan ; Zaitlen, Noah ; Cai, Na ; Dahl, Andy ; Sankararaman, Sriram. / Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries. In: Nature Genetics. 2023 ; Vol. 55, No. 12. pp. 2269-2276.

Bibtex

@article{fa4a3ac702494cf1bcda394830777265,

title = "Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries",

abstract = "Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or {\textquoteleft}fill-in{\textquoteright} missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.",

author = "Ulzee An and Ali Pazokitoroudi and Marcus Alvarez and Lianyun Huang and Silviu Bacanu and Schork, {Andrew J.} and Kenneth Kendler and P{\"a}ivi Pajukanta and Jonathan Flint and Noah Zaitlen and Na Cai and Andy Dahl and Sriram Sankararaman",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s).",

year = "2023",

doi = "10.1038/s41588-023-01558-w",

language = "English",

volume = "55",

pages = "2269--2276",

journal = "Nature Genetics",

issn = "1061-4036",

publisher = "nature publishing group",

number = "12",

}

RIS

TY - JOUR

T1 - Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries

AU - An, Ulzee

AU - Pazokitoroudi, Ali

AU - Alvarez, Marcus

AU - Huang, Lianyun

AU - Bacanu, Silviu

AU - Schork, Andrew J.

AU - Kendler, Kenneth

AU - Pajukanta, Päivi

AU - Flint, Jonathan

AU - Zaitlen, Noah

AU - Cai, Na

AU - Dahl, Andy

AU - Sankararaman, Sriram

PY - 2023

Y1 - 2023

N2 - Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or ‘fill-in’ missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.

AB - Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or ‘fill-in’ missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.

U2 - 10.1038/s41588-023-01558-w

DO - 10.1038/s41588-023-01558-w

M3 - Journal article

C2 - 37985819

AN - SCOPUS:85177165720

VL - 55

SP - 2269

EP - 2276

JO - Nature Genetics

JF - Nature Genetics

SN - 1061-4036

IS - 12

ER -

ID: 379868226

Globe Institute