Standard
Towards complete and error-free genome assemblies of all vertebrate species. / Rhie, Arang; McCarthy, Shane A.; Fedrigo, Olivier; Damas, Joana; Formenti, Giulio; Koren, Sergey; Uliano-Silva, Marcela; Chow, William; Fungtammasan, Arkarachai; Kim, Juwan; Lee, Chul; Ko, Byung June; Chaisson, Mark; Gedman, Gregory L.; Cantin, Lindsey J.; Thibaud-Nissen, Francoise; Haggerty, Leanne; Bista, Iliana; Smith, Michelle; Haase, Bettina; Mountcastle, Jacquelyn; Winkler, Sylke; Paez, Sadye; Howard, Jason; Vernes, Sonja C.; Lama, Tanya M.; Grutzner, Frank; Warren, Wesley C.; Balakrishnan, Christopher N.; Burt, Dave; George, Julia M.; Biegler, Matthew T.; Iorns, David; Digby, Andrew; Eason, Daryl; Robertson, Bruce; Edwards, Taylor; Wilkinson, Mark; Turner, George; Meyer, Axel; Kautt, Andreas F.; Franchini, Paolo; Detrich, H. William; Svardal, Hannes; Wagner, Maximilian; Naylor, Gavin J.P.; Pippel, Martin; Malinsky, Milan; Mooney, Mark; Simbirsky, Maria; Hannigan, Brett T.; Pesout, Trevor; Houck, Marlys; Misuraca, Ann; Kingan, Sarah B.; Hall, Richard; Kronenberg, Zev; Sović, Ivan; Dunn, Christopher; Ning, Zemin; Hastie, Alex; Lee, Joyce; Selvaraj, Siddarth; Green, Richard E.; Putnam, Nicholas H.; Gut, Ivo; Ghurye, Jay; Garrison, Erik; Sims, Ying; Collins, Joanna; Pelan, Sarah; Torrance, James; Tracey, Alan; Wood, Jonathan; Dagnew, Robel E.; Guan, Dengfeng; London, Sarah E.; Clayton, David F.; Mello, Claudio V.; Friedrich, Samantha R.; Lovell, Peter V.; Osipova, Ekaterina; Al-Ajli, Farooq O.; Secomandi, Simona; Kim, Heebal; Theofanopoulou, Constantina; Hiller, Michael; Zhou, Yang; Harris, Robert S.; Makova, Kateryna D.; Medvedev, Paul; Hoffman, Jinna; Masterson, Patrick; Clark, Karen; Martin, Fergal; Howe, Kevin; Flicek, Paul; Walenz, Brian P.; Kwak, Woori; Clawson, Hiram; Diekhans, Mark; Nassar, Luis; Paten, Benedict; Kraus, Robert H. S.; Crawford, Andrew J.; Gilbert, M. Thomas P.; Zhang, Guojie; Venkatesh, Byrappa; Murphy, Robert W.; Koepfli, Klaus-Peter; Shapiro, Beth; Johnson, Warren E.; Di Palma, Federica; Marques-Bonet, Tomas; Teeling, Emma C.; Warnow, Tandy; Graves, Jennifer Marshall; Ryder, Oliver A.; Haussler, David; O’Brien, Stephen J.; Korlach, Jonas; Lewin, Harris A.; Howe, Kerstin; Myers, Eugene W.; Durbin, Richard; Phillippy, Adam M.; Jarvis, Erich D.
In:
Nature, Vol. 592, No. 7856, 2021, p. 737-746.
Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
Rhie, A, McCarthy, SA, Fedrigo, O, Damas, J, Formenti, G, Koren, S, Uliano-Silva, M, Chow, W, Fungtammasan, A, Kim, J, Lee, C, Ko, BJ, Chaisson, M, Gedman, GL, Cantin, LJ, Thibaud-Nissen, F, Haggerty, L, Bista, I, Smith, M, Haase, B, Mountcastle, J, Winkler, S, Paez, S, Howard, J, Vernes, SC, Lama, TM, Grutzner, F, Warren, WC, Balakrishnan, CN, Burt, D, George, JM, Biegler, MT, Iorns, D, Digby, A, Eason, D, Robertson, B, Edwards, T, Wilkinson, M, Turner, G, Meyer, A, Kautt, AF, Franchini, P, Detrich, HW, Svardal, H, Wagner, M, Naylor, GJP, Pippel, M, Malinsky, M, Mooney, M, Simbirsky, M, Hannigan, BT, Pesout, T, Houck, M, Misuraca, A, Kingan, SB, Hall, R, Kronenberg, Z, Sović, I, Dunn, C, Ning, Z, Hastie, A, Lee, J, Selvaraj, S, Green, RE, Putnam, NH, Gut, I, Ghurye, J, Garrison, E, Sims, Y, Collins, J, Pelan, S, Torrance, J, Tracey, A, Wood, J, Dagnew, RE, Guan, D, London, SE, Clayton, DF, Mello, CV, Friedrich, SR, Lovell, PV, Osipova, E, Al-Ajli, FO, Secomandi, S, Kim, H, Theofanopoulou, C, Hiller, M, Zhou, Y, Harris, RS, Makova, KD, Medvedev, P, Hoffman, J, Masterson, P, Clark, K, Martin, F, Howe, K, Flicek, P, Walenz, BP, Kwak, W, Clawson, H, Diekhans, M, Nassar, L, Paten, B, Kraus, RHS, Crawford, AJ
, Gilbert, MTP, Zhang, G, Venkatesh, B, Murphy, RW, Koepfli, K-P, Shapiro, B, Johnson, WE, Di Palma, F, Marques-Bonet, T, Teeling, EC, Warnow, T, Graves, JM, Ryder, OA, Haussler, D, O’Brien, SJ, Korlach, J, Lewin, HA, Howe, K, Myers, EW, Durbin, R, Phillippy, AM & Jarvis, ED 2021, '
Towards complete and error-free genome assemblies of all vertebrate species',
Nature, vol. 592, no. 7856, pp. 737-746.
https://doi.org/10.1038/s41586-021-03451-0
APA
Rhie, A., McCarthy, S. A., Fedrigo, O., Damas, J., Formenti, G., Koren, S., Uliano-Silva, M., Chow, W., Fungtammasan, A., Kim, J., Lee, C., Ko, B. J., Chaisson, M., Gedman, G. L., Cantin, L. J., Thibaud-Nissen, F., Haggerty, L., Bista, I., Smith, M., ... Jarvis, E. D. (2021).
Towards complete and error-free genome assemblies of all vertebrate species.
Nature,
592(7856), 737-746.
https://doi.org/10.1038/s41586-021-03451-0
Vancouver
Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S et al.
Towards complete and error-free genome assemblies of all vertebrate species.
Nature. 2021;592(7856):737-746.
https://doi.org/10.1038/s41586-021-03451-0
Author
Rhie, Arang ; McCarthy, Shane A. ; Fedrigo, Olivier ; Damas, Joana ; Formenti, Giulio ; Koren, Sergey ; Uliano-Silva, Marcela ; Chow, William ; Fungtammasan, Arkarachai ; Kim, Juwan ; Lee, Chul ; Ko, Byung June ; Chaisson, Mark ; Gedman, Gregory L. ; Cantin, Lindsey J. ; Thibaud-Nissen, Francoise ; Haggerty, Leanne ; Bista, Iliana ; Smith, Michelle ; Haase, Bettina ; Mountcastle, Jacquelyn ; Winkler, Sylke ; Paez, Sadye ; Howard, Jason ; Vernes, Sonja C. ; Lama, Tanya M. ; Grutzner, Frank ; Warren, Wesley C. ; Balakrishnan, Christopher N. ; Burt, Dave ; George, Julia M. ; Biegler, Matthew T. ; Iorns, David ; Digby, Andrew ; Eason, Daryl ; Robertson, Bruce ; Edwards, Taylor ; Wilkinson, Mark ; Turner, George ; Meyer, Axel ; Kautt, Andreas F. ; Franchini, Paolo ; Detrich, H. William ; Svardal, Hannes ; Wagner, Maximilian ; Naylor, Gavin J.P. ; Pippel, Martin ; Malinsky, Milan ; Mooney, Mark ; Simbirsky, Maria ; Hannigan, Brett T. ; Pesout, Trevor ; Houck, Marlys ; Misuraca, Ann ; Kingan, Sarah B. ; Hall, Richard ; Kronenberg, Zev ; Sović, Ivan ; Dunn, Christopher ; Ning, Zemin ; Hastie, Alex ; Lee, Joyce ; Selvaraj, Siddarth ; Green, Richard E. ; Putnam, Nicholas H. ; Gut, Ivo ; Ghurye, Jay ; Garrison, Erik ; Sims, Ying ; Collins, Joanna ; Pelan, Sarah ; Torrance, James ; Tracey, Alan ; Wood, Jonathan ; Dagnew, Robel E. ; Guan, Dengfeng ; London, Sarah E. ; Clayton, David F. ; Mello, Claudio V. ; Friedrich, Samantha R. ; Lovell, Peter V. ; Osipova, Ekaterina ; Al-Ajli, Farooq O. ; Secomandi, Simona ; Kim, Heebal ; Theofanopoulou, Constantina ; Hiller, Michael ; Zhou, Yang ; Harris, Robert S. ; Makova, Kateryna D. ; Medvedev, Paul ; Hoffman, Jinna ; Masterson, Patrick ; Clark, Karen ; Martin, Fergal ; Howe, Kevin ; Flicek, Paul ; Walenz, Brian P. ; Kwak, Woori ; Clawson, Hiram ; Diekhans, Mark ; Nassar, Luis ; Paten, Benedict ; Kraus, Robert H. S. ; Crawford, Andrew J. ; Gilbert, M. Thomas P. ; Zhang, Guojie ; Venkatesh, Byrappa ; Murphy, Robert W. ; Koepfli, Klaus-Peter ; Shapiro, Beth ; Johnson, Warren E. ; Di Palma, Federica ; Marques-Bonet, Tomas ; Teeling, Emma C. ; Warnow, Tandy ; Graves, Jennifer Marshall ; Ryder, Oliver A. ; Haussler, David ; O’Brien, Stephen J. ; Korlach, Jonas ; Lewin, Harris A. ; Howe, Kerstin ; Myers, Eugene W. ; Durbin, Richard ; Phillippy, Adam M. ; Jarvis, Erich D. / Towards complete and error-free genome assemblies of all vertebrate species. In: Nature. 2021 ; Vol. 592, No. 7856. pp. 737-746.
Bibtex
@article{029e9e003e2a46798a938f6a3eccb832,
title = "Towards complete and error-free genome assemblies of all vertebrate species",
abstract = "High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.",
author = "Arang Rhie and McCarthy, {Shane A.} and Olivier Fedrigo and Joana Damas and Giulio Formenti and Sergey Koren and Marcela Uliano-Silva and William Chow and Arkarachai Fungtammasan and Juwan Kim and Chul Lee and Ko, {Byung June} and Mark Chaisson and Gedman, {Gregory L.} and Cantin, {Lindsey J.} and Francoise Thibaud-Nissen and Leanne Haggerty and Iliana Bista and Michelle Smith and Bettina Haase and Jacquelyn Mountcastle and Sylke Winkler and Sadye Paez and Jason Howard and Vernes, {Sonja C.} and Lama, {Tanya M.} and Frank Grutzner and Warren, {Wesley C.} and Balakrishnan, {Christopher N.} and Dave Burt and George, {Julia M.} and Biegler, {Matthew T.} and David Iorns and Andrew Digby and Daryl Eason and Bruce Robertson and Taylor Edwards and Mark Wilkinson and George Turner and Axel Meyer and Kautt, {Andreas F.} and Paolo Franchini and Detrich, {H. William} and Hannes Svardal and Maximilian Wagner and Naylor, {Gavin J.P.} and Martin Pippel and Milan Malinsky and Mark Mooney and Maria Simbirsky and Hannigan, {Brett T.} and Trevor Pesout and Marlys Houck and Ann Misuraca and Kingan, {Sarah B.} and Richard Hall and Zev Kronenberg and Ivan Sovi{\'c} and Christopher Dunn and Zemin Ning and Alex Hastie and Joyce Lee and Siddarth Selvaraj and Green, {Richard E.} and Putnam, {Nicholas H.} and Ivo Gut and Jay Ghurye and Erik Garrison and Ying Sims and Joanna Collins and Sarah Pelan and James Torrance and Alan Tracey and Jonathan Wood and Dagnew, {Robel E.} and Dengfeng Guan and London, {Sarah E.} and Clayton, {David F.} and Mello, {Claudio V.} and Friedrich, {Samantha R.} and Lovell, {Peter V.} and Ekaterina Osipova and Al-Ajli, {Farooq O.} and Simona Secomandi and Heebal Kim and Constantina Theofanopoulou and Michael Hiller and Yang Zhou and Harris, {Robert S.} and Makova, {Kateryna D.} and Paul Medvedev and Jinna Hoffman and Patrick Masterson and Karen Clark and Fergal Martin and Kevin Howe and Paul Flicek and Walenz, {Brian P.} and Woori Kwak and Hiram Clawson and Mark Diekhans and Luis Nassar and Benedict Paten and Kraus, {Robert H. S.} and Crawford, {Andrew J.} and Gilbert, {M. Thomas P.} and Guojie Zhang and Byrappa Venkatesh and Murphy, {Robert W.} and Klaus-Peter Koepfli and Beth Shapiro and Johnson, {Warren E.} and {Di Palma}, Federica and Tomas Marques-Bonet and Teeling, {Emma C.} and Tandy Warnow and Graves, {Jennifer Marshall} and Ryder, {Oliver A.} and David Haussler and O{\textquoteright}Brien, {Stephen J.} and Jonas Korlach and Lewin, {Harris A.} and Kerstin Howe and Myers, {Eugene W.} and Richard Durbin and Phillippy, {Adam M.} and Jarvis, {Erich D.}",
note = "Funding Information: Acknowledgements We thank the following persons for feedback and support: R. Johnson, E. Karlsson, K. Lindblad Toh, W. Jun, I. Korf, W. Haerty, G. Etherington, B. Clavijo, and A. Komissarov for discussions in the early stages of the project; R. Fuller for help with the G10K website maintenance, and H. Segal for help with with VGP website development; M. Linh Pham for help with initial grant writing; L. Shalmiyev for administrative help; D. Church, G. Kol, K. Baruch, O. Barad, I. Liachko, E. Muzychenko, S. Garg, and M. Kolmogorov for preliminary analyses performed on one or more genomes; K. Oliver, C. Corton and J. Skelton for data generation; E. Harry for technical support in scaff10x and Pretext; C. Mazzoni for coordinating students and training at Leibniz Institute for Zoo and Wildlife Research and Berlin Center for Genomics in Biodiversity Research; and M. Driller, C. Caswara, M. Vafadar, N. Hill, D. De Panis, A. Whibley, B. Maloney, C. Mitchell, G. Gallo, J. Gaige, K. Amoako-Boadu, M. Jose Gomez, M. Montero, D. Ratnikov, S. Brown, S. Zylka, S. Marcus, and T. Carrasco for completing training and testing the VGP pipeline by producing ordinal representative genome assemblies not described in this manuscript. We thank our company partners (listed below), NCBI, EBI, and Amazon AWS, including AWS for sponsoring sequence storage. J. Fekecs and D. Leja created the animal images, and J. Kim modified them to silhouettes. We thank them for their permission to publish. A.R., S.K., B.P.W. and A.M.P. were supported by the Intramural Research Program of the NHGRI, NIH (1ZIAHG200398). A.R. was also supported by the Korea Health Technology R&D Project through KHIDI, funded by the Ministry of Health & Welfare, Republic of Korea (HI17C2098). S.A.M., I.B. and R.D. were supported by Wellcome Trust grant WT207492; W.C., M. Smith, Z.N., Y.S., J.C., S. Pelan, J.T., A.T., J.W. and Kerstin Howe by WT206194; L.H., F.M., Kevin Howe and P. Flicek by WT108749/Z/15/Z, WT218328/B/19/Z and the European Molecular Biology Laboratory. O.F. and E.D.J. were supported by Howard Hughes Medical Institute and Rockefeller University start-up funds for this project. J.D. and H.A.L. were supported by the Robert and Rosabel Osborne Endowment. M.U.-S. received funding from the European Union{\textquoteright}s Horizon 2020 research and innovation programme under the Marie Sk{\l}odowska-Curie grant agreement (750747). F.T.-N., J. Hoffman, P. Masterson and K.C. were supported by the Intramural Research Program of the NLM, NIH. C.L., B.J.K., J. Kim and H.K. were supported by the Marine Biotechnology Program of KIMST, funded by the Ministry of Ocean and Fisheries, Republic of Korea (20180430). M.C. was supported by Sloan Research Fellowship (FG-2020-12932). S.C.V. was funded by a Max Planck Research Group award from the Max Planck Society, and a Human Frontiers Science Program (HFSP) Research grant (RGP0058/2016). T.M.L., W.E.J. and the Canada lynx genome were funded by the Maine Department of Inland Fisheries & Wildlife (F11AF01099), including when W.E.J. held a National Research Council Research Associateship Award at the Walter Reed Army Institute of Research (WRAIR). C.B. was supported by the NSF (1457541 and 1456612). D.B. was funded by The University of Queensland (HFSP - RGP0030/2015). D.I. was supported by Science Exchange Inc. (Palo Alto, CA). H.W.D. was supported by NSF grants (OPP-0132032 ICEFISH 2004 Cruise, PLR-1444167 and OPP-1955368) and the Marine Science Center at Northeastern University (416). G.J.P.N. and the thorny skate genome were funded by Lenfest Ocean Program (30884). M.P. was funded by the German Federal Ministry of Education and Research (01IS18026C). M. Malinsky was supported by an EMBO fellowship (ALTF 456-2016). The following authors{\textquoteright} contributions were supported by the NIH: S. Selvaraj (R44HG008118); C.V.M., S.R.F., P.V.L. (R21 DC014432/DC/NIDCD); K.D.M. Funding Information: (R01GM130691); H.C. (5U41HG002371-19); M.D. (U41HG007234); and B.P. (R01HG010485). D.G. was supported by the National Key Research and Development Program of China (2017YFC1201201, 2018YFC0910504 and 2017YFC0907503). F.O.A. was supported by Al-Gannas Qatari Society and The Cultural Village Foundation-Katara, Doha, State of Qatar and Monash University Malaysia. C.T. was supported by The Rockefeller University. M. Hiller was supported by the LOEWE-Centre for Translational Biodiversity Genomics (TBG) funded by the Hessen State Ministry of Higher Education, Research and the Arts (HMWK). H.C. was supported by the NHGRI (5U41HG002371-19). R.H.S.K. was funded by the Max Planck Society with computational resources at the bwUniCluster and BinAC funded by the Ministry of Science, Research and the Arts Baden-W{\"u}rttemberg and the Universities of the State of Baden-W{\"u}rttemberg, Germany (bwHPC-C5). B.V. was supported by the Biomedical Research Council of A*STAR, Singapore. T.M.-B. was funded by the European Research Council under the European Union{\textquoteright}s Horizon 2020 research and innovation programme (864203), MINECO/ FEDER, UE (BFU2017-86471-P), Unidad de Excelencia Mar{\'i}a de Maeztu, AEI (CEX2018-000792-M), a Howard Hughes International Early Career award, Obra Social “La Caixa” and Secretaria d{\textquoteright}Universitats i Recerca and CERCA Programme del Departament d{\textquoteright}Economia i Coneixement de la Generalitat de Catalunya (GRC 2017 SGR 880). E.C.T. was supported by the European Research Council (ERC-2012-StG311000) and an Irish Research Council Laureate Award. M.T.P.G. was supported by an ERC Consolidator Award 681396-Extinction Genomics, and a Danish National Research Foundation Center Grant (DNRF143). T.W. was supported by the NSF (1458652). J. M. Graves was supported by the Australian Research Council (CEO561477). E.W.M. was partially supported by the German Federal Ministry of Education and Research (01IS18026C). Complementary sequencing support for the Anna{\textquoteright}s hummingbird and several genomes was provided by Pacific Biosciences, Bionano Genomics, Dovetail Genomics, Arima Genomics, Phase Genomics, 10X Genomics, NRGene, Oxford Nanopore Technologies, Illumina, and DNAnexus. All other sequencing and assembly were conducted at the Rockefeller University, Sanger Institute, and Max Planck Institute Dresden genome labs. Part of this work used the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih. gov). We acknowledge funding from the Wellcome Trust (108749/Z/15/Z) and the European Molecular Biology Laboratory. We thank Le Comit{\'e} Scientifique R{\'e}gional du Patrimoine Naturel and Direction de l{\textquoteright}Environnement, de l{\textquoteright}Am{\'e}nagement et du Logement, Guyanne for research approvals and export permits. Publisher Copyright: {\textcopyright} 2021, The Author(s).",
year = "2021",
doi = "10.1038/s41586-021-03451-0",
language = "English",
volume = "592",
pages = "737--746",
journal = "Nature",
issn = "0028-0836",
publisher = "nature publishing group",
number = "7856",
}
RIS
TY - JOUR
T1 - Towards complete and error-free genome assemblies of all vertebrate species
AU - Rhie, Arang
AU - McCarthy, Shane A.
AU - Fedrigo, Olivier
AU - Damas, Joana
AU - Formenti, Giulio
AU - Koren, Sergey
AU - Uliano-Silva, Marcela
AU - Chow, William
AU - Fungtammasan, Arkarachai
AU - Kim, Juwan
AU - Lee, Chul
AU - Ko, Byung June
AU - Chaisson, Mark
AU - Gedman, Gregory L.
AU - Cantin, Lindsey J.
AU - Thibaud-Nissen, Francoise
AU - Haggerty, Leanne
AU - Bista, Iliana
AU - Smith, Michelle
AU - Haase, Bettina
AU - Mountcastle, Jacquelyn
AU - Winkler, Sylke
AU - Paez, Sadye
AU - Howard, Jason
AU - Vernes, Sonja C.
AU - Lama, Tanya M.
AU - Grutzner, Frank
AU - Warren, Wesley C.
AU - Balakrishnan, Christopher N.
AU - Burt, Dave
AU - George, Julia M.
AU - Biegler, Matthew T.
AU - Iorns, David
AU - Digby, Andrew
AU - Eason, Daryl
AU - Robertson, Bruce
AU - Edwards, Taylor
AU - Wilkinson, Mark
AU - Turner, George
AU - Meyer, Axel
AU - Kautt, Andreas F.
AU - Franchini, Paolo
AU - Detrich, H. William
AU - Svardal, Hannes
AU - Wagner, Maximilian
AU - Naylor, Gavin J.P.
AU - Pippel, Martin
AU - Malinsky, Milan
AU - Mooney, Mark
AU - Simbirsky, Maria
AU - Hannigan, Brett T.
AU - Pesout, Trevor
AU - Houck, Marlys
AU - Misuraca, Ann
AU - Kingan, Sarah B.
AU - Hall, Richard
AU - Kronenberg, Zev
AU - Sović, Ivan
AU - Dunn, Christopher
AU - Ning, Zemin
AU - Hastie, Alex
AU - Lee, Joyce
AU - Selvaraj, Siddarth
AU - Green, Richard E.
AU - Putnam, Nicholas H.
AU - Gut, Ivo
AU - Ghurye, Jay
AU - Garrison, Erik
AU - Sims, Ying
AU - Collins, Joanna
AU - Pelan, Sarah
AU - Torrance, James
AU - Tracey, Alan
AU - Wood, Jonathan
AU - Dagnew, Robel E.
AU - Guan, Dengfeng
AU - London, Sarah E.
AU - Clayton, David F.
AU - Mello, Claudio V.
AU - Friedrich, Samantha R.
AU - Lovell, Peter V.
AU - Osipova, Ekaterina
AU - Al-Ajli, Farooq O.
AU - Secomandi, Simona
AU - Kim, Heebal
AU - Theofanopoulou, Constantina
AU - Hiller, Michael
AU - Zhou, Yang
AU - Harris, Robert S.
AU - Makova, Kateryna D.
AU - Medvedev, Paul
AU - Hoffman, Jinna
AU - Masterson, Patrick
AU - Clark, Karen
AU - Martin, Fergal
AU - Howe, Kevin
AU - Flicek, Paul
AU - Walenz, Brian P.
AU - Kwak, Woori
AU - Clawson, Hiram
AU - Diekhans, Mark
AU - Nassar, Luis
AU - Paten, Benedict
AU - Kraus, Robert H. S.
AU - Crawford, Andrew J.
AU - Gilbert, M. Thomas P.
AU - Zhang, Guojie
AU - Venkatesh, Byrappa
AU - Murphy, Robert W.
AU - Koepfli, Klaus-Peter
AU - Shapiro, Beth
AU - Johnson, Warren E.
AU - Di Palma, Federica
AU - Marques-Bonet, Tomas
AU - Teeling, Emma C.
AU - Warnow, Tandy
AU - Graves, Jennifer Marshall
AU - Ryder, Oliver A.
AU - Haussler, David
AU - O’Brien, Stephen J.
AU - Korlach, Jonas
AU - Lewin, Harris A.
AU - Howe, Kerstin
AU - Myers, Eugene W.
AU - Durbin, Richard
AU - Phillippy, Adam M.
AU - Jarvis, Erich D.
N1 - Funding Information:
Acknowledgements We thank the following persons for feedback and support: R. Johnson, E. Karlsson, K. Lindblad Toh, W. Jun, I. Korf, W. Haerty, G. Etherington, B. Clavijo, and A. Komissarov for discussions in the early stages of the project; R. Fuller for help with the G10K website maintenance, and H. Segal for help with with VGP website development; M. Linh Pham for help with initial grant writing; L. Shalmiyev for administrative help; D. Church, G. Kol, K. Baruch, O. Barad, I. Liachko, E. Muzychenko, S. Garg, and M. Kolmogorov for preliminary analyses performed on one or more genomes; K. Oliver, C. Corton and J. Skelton for data generation; E. Harry for technical support in scaff10x and Pretext; C. Mazzoni for coordinating students and training at Leibniz Institute for Zoo and Wildlife Research and Berlin Center for Genomics in Biodiversity Research; and M. Driller, C. Caswara, M. Vafadar, N. Hill, D. De Panis, A. Whibley, B. Maloney, C. Mitchell, G. Gallo, J. Gaige, K. Amoako-Boadu, M. Jose Gomez, M. Montero, D. Ratnikov, S. Brown, S. Zylka, S. Marcus, and T. Carrasco for completing training and testing the VGP pipeline by producing ordinal representative genome assemblies not described in this manuscript. We thank our company partners (listed below), NCBI, EBI, and Amazon AWS, including AWS for sponsoring sequence storage. J. Fekecs and D. Leja created the animal images, and J. Kim modified them to silhouettes. We thank them for their permission to publish. A.R., S.K., B.P.W. and A.M.P. were supported by the Intramural Research Program of the NHGRI, NIH (1ZIAHG200398). A.R. was also supported by the Korea Health Technology R&D Project through KHIDI, funded by the Ministry of Health & Welfare, Republic of Korea (HI17C2098). S.A.M., I.B. and R.D. were supported by Wellcome Trust grant WT207492; W.C., M. Smith, Z.N., Y.S., J.C., S. Pelan, J.T., A.T., J.W. and Kerstin Howe by WT206194; L.H., F.M., Kevin Howe and P. Flicek by WT108749/Z/15/Z, WT218328/B/19/Z and the European Molecular Biology Laboratory. O.F. and E.D.J. were supported by Howard Hughes Medical Institute and Rockefeller University start-up funds for this project. J.D. and H.A.L. were supported by the Robert and Rosabel Osborne Endowment. M.U.-S. received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement (750747). F.T.-N., J. Hoffman, P. Masterson and K.C. were supported by the Intramural Research Program of the NLM, NIH. C.L., B.J.K., J. Kim and H.K. were supported by the Marine Biotechnology Program of KIMST, funded by the Ministry of Ocean and Fisheries, Republic of Korea (20180430). M.C. was supported by Sloan Research Fellowship (FG-2020-12932). S.C.V. was funded by a Max Planck Research Group award from the Max Planck Society, and a Human Frontiers Science Program (HFSP) Research grant (RGP0058/2016). T.M.L., W.E.J. and the Canada lynx genome were funded by the Maine Department of Inland Fisheries & Wildlife (F11AF01099), including when W.E.J. held a National Research Council Research Associateship Award at the Walter Reed Army Institute of Research (WRAIR). C.B. was supported by the NSF (1457541 and 1456612). D.B. was funded by The University of Queensland (HFSP - RGP0030/2015). D.I. was supported by Science Exchange Inc. (Palo Alto, CA). H.W.D. was supported by NSF grants (OPP-0132032 ICEFISH 2004 Cruise, PLR-1444167 and OPP-1955368) and the Marine Science Center at Northeastern University (416). G.J.P.N. and the thorny skate genome were funded by Lenfest Ocean Program (30884). M.P. was funded by the German Federal Ministry of Education and Research (01IS18026C). M. Malinsky was supported by an EMBO fellowship (ALTF 456-2016). The following authors’ contributions were supported by the NIH: S. Selvaraj (R44HG008118); C.V.M., S.R.F., P.V.L. (R21 DC014432/DC/NIDCD); K.D.M.
Funding Information:
(R01GM130691); H.C. (5U41HG002371-19); M.D. (U41HG007234); and B.P. (R01HG010485). D.G. was supported by the National Key Research and Development Program of China (2017YFC1201201, 2018YFC0910504 and 2017YFC0907503). F.O.A. was supported by Al-Gannas Qatari Society and The Cultural Village Foundation-Katara, Doha, State of Qatar and Monash University Malaysia. C.T. was supported by The Rockefeller University. M. Hiller was supported by the LOEWE-Centre for Translational Biodiversity Genomics (TBG) funded by the Hessen State Ministry of Higher Education, Research and the Arts (HMWK). H.C. was supported by the NHGRI (5U41HG002371-19). R.H.S.K. was funded by the Max Planck Society with computational resources at the bwUniCluster and BinAC funded by the Ministry of Science, Research and the Arts Baden-Württemberg and the Universities of the State of Baden-Württemberg, Germany (bwHPC-C5). B.V. was supported by the Biomedical Research Council of A*STAR, Singapore. T.M.-B. was funded by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (864203), MINECO/ FEDER, UE (BFU2017-86471-P), Unidad de Excelencia María de Maeztu, AEI (CEX2018-000792-M), a Howard Hughes International Early Career award, Obra Social “La Caixa” and Secretaria d’Universitats i Recerca and CERCA Programme del Departament d’Economia i Coneixement de la Generalitat de Catalunya (GRC 2017 SGR 880). E.C.T. was supported by the European Research Council (ERC-2012-StG311000) and an Irish Research Council Laureate Award. M.T.P.G. was supported by an ERC Consolidator Award 681396-Extinction Genomics, and a Danish National Research Foundation Center Grant (DNRF143). T.W. was supported by the NSF (1458652). J. M. Graves was supported by the Australian Research Council (CEO561477). E.W.M. was partially supported by the German Federal Ministry of Education and Research (01IS18026C). Complementary sequencing support for the Anna’s hummingbird and several genomes was provided by Pacific Biosciences, Bionano Genomics, Dovetail Genomics, Arima Genomics, Phase Genomics, 10X Genomics, NRGene, Oxford Nanopore Technologies, Illumina, and DNAnexus. All other sequencing and assembly were conducted at the Rockefeller University, Sanger Institute, and Max Planck Institute Dresden genome labs. Part of this work used the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih. gov). We acknowledge funding from the Wellcome Trust (108749/Z/15/Z) and the European Molecular Biology Laboratory. We thank Le Comité Scientifique Régional du Patrimoine Naturel and Direction de l’Environnement, de l’Aménagement et du Logement, Guyanne for research approvals and export permits.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021
Y1 - 2021
N2 - High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
AB - High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
U2 - 10.1038/s41586-021-03451-0
DO - 10.1038/s41586-021-03451-0
M3 - Journal article
C2 - 33911273
AN - SCOPUS:85103951723
VL - 592
SP - 737
EP - 746
JO - Nature
JF - Nature
SN - 0028-0836
IS - 7856
ER -