Towards complete and error-free genome assemblies of all vertebrate species

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Towards complete and error-free genome assemblies of all vertebrate species. / Rhie, Arang; McCarthy, Shane A.; Fedrigo, Olivier; Damas, Joana; Formenti, Giulio; Koren, Sergey; Uliano-Silva, Marcela; Chow, William; Fungtammasan, Arkarachai; Kim, Juwan; Lee, Chul; Ko, Byung June; Chaisson, Mark; Gedman, Gregory L.; Cantin, Lindsey J.; Thibaud-Nissen, Francoise; Haggerty, Leanne; Bista, Iliana; Smith, Michelle; Haase, Bettina; Mountcastle, Jacquelyn; Winkler, Sylke; Paez, Sadye; Howard, Jason; Vernes, Sonja C.; Lama, Tanya M.; Grutzner, Frank; Warren, Wesley C.; Balakrishnan, Christopher N.; Burt, Dave; George, Julia M.; Biegler, Matthew T.; Iorns, David; Digby, Andrew; Eason, Daryl; Robertson, Bruce; Edwards, Taylor; Wilkinson, Mark; Turner, George; Meyer, Axel; Kautt, Andreas F.; Franchini, Paolo; Detrich, H. William; Svardal, Hannes; Wagner, Maximilian; Naylor, Gavin J.P.; Pippel, Martin; Malinsky, Milan; Mooney, Mark; Simbirsky, Maria; Hannigan, Brett T.; Pesout, Trevor; Houck, Marlys; Misuraca, Ann; Kingan, Sarah B.; Hall, Richard; Kronenberg, Zev; Sović, Ivan; Dunn, Christopher; Ning, Zemin; Hastie, Alex; Lee, Joyce; Selvaraj, Siddarth; Green, Richard E.; Putnam, Nicholas H.; Gut, Ivo; Ghurye, Jay; Garrison, Erik; Sims, Ying; Collins, Joanna; Pelan, Sarah; Torrance, James; Tracey, Alan; Wood, Jonathan; Dagnew, Robel E.; Guan, Dengfeng; London, Sarah E.; Clayton, David F.; Mello, Claudio V.; Friedrich, Samantha R.; Lovell, Peter V.; Osipova, Ekaterina; Al-Ajli, Farooq O.; Secomandi, Simona; Kim, Heebal; Theofanopoulou, Constantina; Hiller, Michael; Zhou, Yang; Harris, Robert S.; Makova, Kateryna D.; Medvedev, Paul; Hoffman, Jinna; Masterson, Patrick; Clark, Karen; Martin, Fergal; Howe, Kevin; Flicek, Paul; Walenz, Brian P.; Kwak, Woori; Clawson, Hiram; Diekhans, Mark; Nassar, Luis; Paten, Benedict; Kraus, Robert H. S.; Crawford, Andrew J.; Gilbert, M. Thomas P.; Zhang, Guojie; Venkatesh, Byrappa; Murphy, Robert W.; Koepfli, Klaus-Peter; Shapiro, Beth; Johnson, Warren E.; Di Palma, Federica; Marques-Bonet, Tomas; Teeling, Emma C.; Warnow, Tandy; Graves, Jennifer Marshall; Ryder, Oliver A.; Haussler, David; O’Brien, Stephen J.; Korlach, Jonas; Lewin, Harris A.; Howe, Kerstin; Myers, Eugene W.; Durbin, Richard; Phillippy, Adam M.; Jarvis, Erich D.

In: Nature, Vol. 592, No. 7856, 2021, p. 737-746.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Rhie, A, McCarthy, SA, Fedrigo, O, Damas, J, Formenti, G, Koren, S, Uliano-Silva, M, Chow, W, Fungtammasan, A, Kim, J, Lee, C, Ko, BJ, Chaisson, M, Gedman, GL, Cantin, LJ, Thibaud-Nissen, F, Haggerty, L, Bista, I, Smith, M, Haase, B, Mountcastle, J, Winkler, S, Paez, S, Howard, J, Vernes, SC, Lama, TM, Grutzner, F, Warren, WC, Balakrishnan, CN, Burt, D, George, JM, Biegler, MT, Iorns, D, Digby, A, Eason, D, Robertson, B, Edwards, T, Wilkinson, M, Turner, G, Meyer, A, Kautt, AF, Franchini, P, Detrich, HW, Svardal, H, Wagner, M, Naylor, GJP, Pippel, M, Malinsky, M, Mooney, M, Simbirsky, M, Hannigan, BT, Pesout, T, Houck, M, Misuraca, A, Kingan, SB, Hall, R, Kronenberg, Z, Sović, I, Dunn, C, Ning, Z, Hastie, A, Lee, J, Selvaraj, S, Green, RE, Putnam, NH, Gut, I, Ghurye, J, Garrison, E, Sims, Y, Collins, J, Pelan, S, Torrance, J, Tracey, A, Wood, J, Dagnew, RE, Guan, D, London, SE, Clayton, DF, Mello, CV, Friedrich, SR, Lovell, PV, Osipova, E, Al-Ajli, FO, Secomandi, S, Kim, H, Theofanopoulou, C, Hiller, M, Zhou, Y, Harris, RS, Makova, KD, Medvedev, P, Hoffman, J, Masterson, P, Clark, K, Martin, F, Howe, K, Flicek, P, Walenz, BP, Kwak, W, Clawson, H, Diekhans, M, Nassar, L, Paten, B, Kraus, RHS, Crawford, AJ, Gilbert, MTP, Zhang, G, Venkatesh, B, Murphy, RW, Koepfli, K-P, Shapiro, B, Johnson, WE, Di Palma, F, Marques-Bonet, T, Teeling, EC, Warnow, T, Graves, JM, Ryder, OA, Haussler, D, O’Brien, SJ, Korlach, J, Lewin, HA, Howe, K, Myers, EW, Durbin, R, Phillippy, AM & Jarvis, ED 2021, 'Towards complete and error-free genome assemblies of all vertebrate species', Nature, vol. 592, no. 7856, pp. 737-746. https://doi.org/10.1038/s41586-021-03451-0

APA

Rhie, A., McCarthy, S. A., Fedrigo, O., Damas, J., Formenti, G., Koren, S., Uliano-Silva, M., Chow, W., Fungtammasan, A., Kim, J., Lee, C., Ko, B. J., Chaisson, M., Gedman, G. L., Cantin, L. J., Thibaud-Nissen, F., Haggerty, L., Bista, I., Smith, M., ... Jarvis, E. D. (2021). Towards complete and error-free genome assemblies of all vertebrate species. Nature, 592(7856), 737-746. https://doi.org/10.1038/s41586-021-03451-0

Vancouver

Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021;592(7856):737-746. https://doi.org/10.1038/s41586-021-03451-0

Author

Rhie, Arang ; McCarthy, Shane A. ; Fedrigo, Olivier ; Damas, Joana ; Formenti, Giulio ; Koren, Sergey ; Uliano-Silva, Marcela ; Chow, William ; Fungtammasan, Arkarachai ; Kim, Juwan ; Lee, Chul ; Ko, Byung June ; Chaisson, Mark ; Gedman, Gregory L. ; Cantin, Lindsey J. ; Thibaud-Nissen, Francoise ; Haggerty, Leanne ; Bista, Iliana ; Smith, Michelle ; Haase, Bettina ; Mountcastle, Jacquelyn ; Winkler, Sylke ; Paez, Sadye ; Howard, Jason ; Vernes, Sonja C. ; Lama, Tanya M. ; Grutzner, Frank ; Warren, Wesley C. ; Balakrishnan, Christopher N. ; Burt, Dave ; George, Julia M. ; Biegler, Matthew T. ; Iorns, David ; Digby, Andrew ; Eason, Daryl ; Robertson, Bruce ; Edwards, Taylor ; Wilkinson, Mark ; Turner, George ; Meyer, Axel ; Kautt, Andreas F. ; Franchini, Paolo ; Detrich, H. William ; Svardal, Hannes ; Wagner, Maximilian ; Naylor, Gavin J.P. ; Pippel, Martin ; Malinsky, Milan ; Mooney, Mark ; Simbirsky, Maria ; Hannigan, Brett T. ; Pesout, Trevor ; Houck, Marlys ; Misuraca, Ann ; Kingan, Sarah B. ; Hall, Richard ; Kronenberg, Zev ; Sović, Ivan ; Dunn, Christopher ; Ning, Zemin ; Hastie, Alex ; Lee, Joyce ; Selvaraj, Siddarth ; Green, Richard E. ; Putnam, Nicholas H. ; Gut, Ivo ; Ghurye, Jay ; Garrison, Erik ; Sims, Ying ; Collins, Joanna ; Pelan, Sarah ; Torrance, James ; Tracey, Alan ; Wood, Jonathan ; Dagnew, Robel E. ; Guan, Dengfeng ; London, Sarah E. ; Clayton, David F. ; Mello, Claudio V. ; Friedrich, Samantha R. ; Lovell, Peter V. ; Osipova, Ekaterina ; Al-Ajli, Farooq O. ; Secomandi, Simona ; Kim, Heebal ; Theofanopoulou, Constantina ; Hiller, Michael ; Zhou, Yang ; Harris, Robert S. ; Makova, Kateryna D. ; Medvedev, Paul ; Hoffman, Jinna ; Masterson, Patrick ; Clark, Karen ; Martin, Fergal ; Howe, Kevin ; Flicek, Paul ; Walenz, Brian P. ; Kwak, Woori ; Clawson, Hiram ; Diekhans, Mark ; Nassar, Luis ; Paten, Benedict ; Kraus, Robert H. S. ; Crawford, Andrew J. ; Gilbert, M. Thomas P. ; Zhang, Guojie ; Venkatesh, Byrappa ; Murphy, Robert W. ; Koepfli, Klaus-Peter ; Shapiro, Beth ; Johnson, Warren E. ; Di Palma, Federica ; Marques-Bonet, Tomas ; Teeling, Emma C. ; Warnow, Tandy ; Graves, Jennifer Marshall ; Ryder, Oliver A. ; Haussler, David ; O’Brien, Stephen J. ; Korlach, Jonas ; Lewin, Harris A. ; Howe, Kerstin ; Myers, Eugene W. ; Durbin, Richard ; Phillippy, Adam M. ; Jarvis, Erich D. / Towards complete and error-free genome assemblies of all vertebrate species. In: Nature. 2021 ; Vol. 592, No. 7856. pp. 737-746.

Bibtex

@article{029e9e003e2a46798a938f6a3eccb832,
title = "Towards complete and error-free genome assemblies of all vertebrate species",
abstract = "High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.",
author = "Arang Rhie and McCarthy, {Shane A.} and Olivier Fedrigo and Joana Damas and Giulio Formenti and Sergey Koren and Marcela Uliano-Silva and William Chow and Arkarachai Fungtammasan and Juwan Kim and Chul Lee and Ko, {Byung June} and Mark Chaisson and Gedman, {Gregory L.} and Cantin, {Lindsey J.} and Francoise Thibaud-Nissen and Leanne Haggerty and Iliana Bista and Michelle Smith and Bettina Haase and Jacquelyn Mountcastle and Sylke Winkler and Sadye Paez and Jason Howard and Vernes, {Sonja C.} and Lama, {Tanya M.} and Frank Grutzner and Warren, {Wesley C.} and Balakrishnan, {Christopher N.} and Dave Burt and George, {Julia M.} and Biegler, {Matthew T.} and David Iorns and Andrew Digby and Daryl Eason and Bruce Robertson and Taylor Edwards and Mark Wilkinson and George Turner and Axel Meyer and Kautt, {Andreas F.} and Paolo Franchini and Detrich, {H. William} and Hannes Svardal and Maximilian Wagner and Naylor, {Gavin J.P.} and Martin Pippel and Milan Malinsky and Mark Mooney and Maria Simbirsky and Hannigan, {Brett T.} and Trevor Pesout and Marlys Houck and Ann Misuraca and Kingan, {Sarah B.} and Richard Hall and Zev Kronenberg and Ivan Sovi{\'c} and Christopher Dunn and Zemin Ning and Alex Hastie and Joyce Lee and Siddarth Selvaraj and Green, {Richard E.} and Putnam, {Nicholas H.} and Ivo Gut and Jay Ghurye and Erik Garrison and Ying Sims and Joanna Collins and Sarah Pelan and James Torrance and Alan Tracey and Jonathan Wood and Dagnew, {Robel E.} and Dengfeng Guan and London, {Sarah E.} and Clayton, {David F.} and Mello, {Claudio V.} and Friedrich, {Samantha R.} and Lovell, {Peter V.} and Ekaterina Osipova and Al-Ajli, {Farooq O.} and Simona Secomandi and Heebal Kim and Constantina Theofanopoulou and Michael Hiller and Yang Zhou and Harris, {Robert S.} and Makova, {Kateryna D.} and Paul Medvedev and Jinna Hoffman and Patrick Masterson and Karen Clark and Fergal Martin and Kevin Howe and Paul Flicek and Walenz, {Brian P.} and Woori Kwak and Hiram Clawson and Mark Diekhans and Luis Nassar and Benedict Paten and Kraus, {Robert H. S.} and Crawford, {Andrew J.} and Gilbert, {M. Thomas P.} and Guojie Zhang and Byrappa Venkatesh and Murphy, {Robert W.} and Klaus-Peter Koepfli and Beth Shapiro and Johnson, {Warren E.} and {Di Palma}, Federica and Tomas Marques-Bonet and Teeling, {Emma C.} and Tandy Warnow and Graves, {Jennifer Marshall} and Ryder, {Oliver A.} and David Haussler and O{\textquoteright}Brien, {Stephen J.} and Jonas Korlach and Lewin, {Harris A.} and Kerstin Howe and Myers, {Eugene W.} and Richard Durbin and Phillippy, {Adam M.} and Jarvis, {Erich D.}",
note = "Funding Information: Acknowledgements We thank the following persons for feedback and support: R. Johnson, E. Karlsson, K. Lindblad Toh, W. Jun, I. Korf, W. Haerty, G. Etherington, B. Clavijo, and A. Komissarov for discussions in the early stages of the project; R. Fuller for help with the G10K website maintenance, and H. Segal for help with with VGP website development; M. Linh Pham for help with initial grant writing; L. Shalmiyev for administrative help; D. Church, G. Kol, K. Baruch, O. Barad, I. Liachko, E. Muzychenko, S. Garg, and M. Kolmogorov for preliminary analyses performed on one or more genomes; K. Oliver, C. Corton and J. Skelton for data generation; E. Harry for technical support in scaff10x and Pretext; C. Mazzoni for coordinating students and training at Leibniz Institute for Zoo and Wildlife Research and Berlin Center for Genomics in Biodiversity Research; and M. Driller, C. Caswara, M. Vafadar, N. Hill, D. De Panis, A. Whibley, B. Maloney, C. Mitchell, G. Gallo, J. Gaige, K. Amoako-Boadu, M. Jose Gomez, M. Montero, D. Ratnikov, S. Brown, S. Zylka, S. Marcus, and T. Carrasco for completing training and testing the VGP pipeline by producing ordinal representative genome assemblies not described in this manuscript. We thank our company partners (listed below), NCBI, EBI, and Amazon AWS, including AWS for sponsoring sequence storage. J. Fekecs and D. Leja created the animal images, and J. Kim modified them to silhouettes. We thank them for their permission to publish. A.R., S.K., B.P.W. and A.M.P. were supported by the Intramural Research Program of the NHGRI, NIH (1ZIAHG200398). A.R. was also supported by the Korea Health Technology R&D Project through KHIDI, funded by the Ministry of Health & Welfare, Republic of Korea (HI17C2098). S.A.M., I.B. and R.D. were supported by Wellcome Trust grant WT207492; W.C., M. Smith, Z.N., Y.S., J.C., S. Pelan, J.T., A.T., J.W. and Kerstin Howe by WT206194; L.H., F.M., Kevin Howe and P. Flicek by WT108749/Z/15/Z, WT218328/B/19/Z and the European Molecular Biology Laboratory. O.F. and E.D.J. were supported by Howard Hughes Medical Institute and Rockefeller University start-up funds for this project. J.D. and H.A.L. were supported by the Robert and Rosabel Osborne Endowment. M.U.-S. received funding from the European Union{\textquoteright}s Horizon 2020 research and innovation programme under the Marie Sk{\l}odowska-Curie grant agreement (750747). F.T.-N., J. Hoffman, P. Masterson and K.C. were supported by the Intramural Research Program of the NLM, NIH. C.L., B.J.K., J. Kim and H.K. were supported by the Marine Biotechnology Program of KIMST, funded by the Ministry of Ocean and Fisheries, Republic of Korea (20180430). M.C. was supported by Sloan Research Fellowship (FG-2020-12932). S.C.V. was funded by a Max Planck Research Group award from the Max Planck Society, and a Human Frontiers Science Program (HFSP) Research grant (RGP0058/2016). T.M.L., W.E.J. and the Canada lynx genome were funded by the Maine Department of Inland Fisheries & Wildlife (F11AF01099), including when W.E.J. held a National Research Council Research Associateship Award at the Walter Reed Army Institute of Research (WRAIR). C.B. was supported by the NSF (1457541 and 1456612). D.B. was funded by The University of Queensland (HFSP - RGP0030/2015). D.I. was supported by Science Exchange Inc. (Palo Alto, CA). H.W.D. was supported by NSF grants (OPP-0132032 ICEFISH 2004 Cruise, PLR-1444167 and OPP-1955368) and the Marine Science Center at Northeastern University (416). G.J.P.N. and the thorny skate genome were funded by Lenfest Ocean Program (30884). M.P. was funded by the German Federal Ministry of Education and Research (01IS18026C). M. Malinsky was supported by an EMBO fellowship (ALTF 456-2016). The following authors{\textquoteright} contributions were supported by the NIH: S. Selvaraj (R44HG008118); C.V.M., S.R.F., P.V.L. (R21 DC014432/DC/NIDCD); K.D.M. Funding Information: (R01GM130691); H.C. (5U41HG002371-19); M.D. (U41HG007234); and B.P. (R01HG010485). D.G. was supported by the National Key Research and Development Program of China (2017YFC1201201, 2018YFC0910504 and 2017YFC0907503). F.O.A. was supported by Al-Gannas Qatari Society and The Cultural Village Foundation-Katara, Doha, State of Qatar and Monash University Malaysia. C.T. was supported by The Rockefeller University. M. Hiller was supported by the LOEWE-Centre for Translational Biodiversity Genomics (TBG) funded by the Hessen State Ministry of Higher Education, Research and the Arts (HMWK). H.C. was supported by the NHGRI (5U41HG002371-19). R.H.S.K. was funded by the Max Planck Society with computational resources at the bwUniCluster and BinAC funded by the Ministry of Science, Research and the Arts Baden-W{\"u}rttemberg and the Universities of the State of Baden-W{\"u}rttemberg, Germany (bwHPC-C5). B.V. was supported by the Biomedical Research Council of A*STAR, Singapore. T.M.-B. was funded by the European Research Council under the European Union{\textquoteright}s Horizon 2020 research and innovation programme (864203), MINECO/ FEDER, UE (BFU2017-86471-P), Unidad de Excelencia Mar{\'i}a de Maeztu, AEI (CEX2018-000792-M), a Howard Hughes International Early Career award, Obra Social “La Caixa” and Secretaria d{\textquoteright}Universitats i Recerca and CERCA Programme del Departament d{\textquoteright}Economia i Coneixement de la Generalitat de Catalunya (GRC 2017 SGR 880). E.C.T. was supported by the European Research Council (ERC-2012-StG311000) and an Irish Research Council Laureate Award. M.T.P.G. was supported by an ERC Consolidator Award 681396-Extinction Genomics, and a Danish National Research Foundation Center Grant (DNRF143). T.W. was supported by the NSF (1458652). J. M. Graves was supported by the Australian Research Council (CEO561477). E.W.M. was partially supported by the German Federal Ministry of Education and Research (01IS18026C). Complementary sequencing support for the Anna{\textquoteright}s hummingbird and several genomes was provided by Pacific Biosciences, Bionano Genomics, Dovetail Genomics, Arima Genomics, Phase Genomics, 10X Genomics, NRGene, Oxford Nanopore Technologies, Illumina, and DNAnexus. All other sequencing and assembly were conducted at the Rockefeller University, Sanger Institute, and Max Planck Institute Dresden genome labs. Part of this work used the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih. gov). We acknowledge funding from the Wellcome Trust (108749/Z/15/Z) and the European Molecular Biology Laboratory. We thank Le Comit{\'e} Scientifique R{\'e}gional du Patrimoine Naturel and Direction de l{\textquoteright}Environnement, de l{\textquoteright}Am{\'e}nagement et du Logement, Guyanne for research approvals and export permits. Publisher Copyright: {\textcopyright} 2021, The Author(s).",
year = "2021",
doi = "10.1038/s41586-021-03451-0",
language = "English",
volume = "592",
pages = "737--746",
journal = "Nature",
issn = "0028-0836",
publisher = "nature publishing group",
number = "7856",

}

RIS

TY - JOUR

T1 - Towards complete and error-free genome assemblies of all vertebrate species

AU - Rhie, Arang

AU - McCarthy, Shane A.

AU - Fedrigo, Olivier

AU - Damas, Joana

AU - Formenti, Giulio

AU - Koren, Sergey

AU - Uliano-Silva, Marcela

AU - Chow, William

AU - Fungtammasan, Arkarachai

AU - Kim, Juwan

AU - Lee, Chul

AU - Ko, Byung June

AU - Chaisson, Mark

AU - Gedman, Gregory L.

AU - Cantin, Lindsey J.

AU - Thibaud-Nissen, Francoise

AU - Haggerty, Leanne

AU - Bista, Iliana

AU - Smith, Michelle

AU - Haase, Bettina

AU - Mountcastle, Jacquelyn

AU - Winkler, Sylke

AU - Paez, Sadye

AU - Howard, Jason

AU - Vernes, Sonja C.

AU - Lama, Tanya M.

AU - Grutzner, Frank

AU - Warren, Wesley C.

AU - Balakrishnan, Christopher N.

AU - Burt, Dave

AU - George, Julia M.

AU - Biegler, Matthew T.

AU - Iorns, David

AU - Digby, Andrew

AU - Eason, Daryl

AU - Robertson, Bruce

AU - Edwards, Taylor

AU - Wilkinson, Mark

AU - Turner, George

AU - Meyer, Axel

AU - Kautt, Andreas F.

AU - Franchini, Paolo

AU - Detrich, H. William

AU - Svardal, Hannes

AU - Wagner, Maximilian

AU - Naylor, Gavin J.P.

AU - Pippel, Martin

AU - Malinsky, Milan

AU - Mooney, Mark

AU - Simbirsky, Maria

AU - Hannigan, Brett T.

AU - Pesout, Trevor

AU - Houck, Marlys

AU - Misuraca, Ann

AU - Kingan, Sarah B.

AU - Hall, Richard

AU - Kronenberg, Zev

AU - Sović, Ivan

AU - Dunn, Christopher

AU - Ning, Zemin

AU - Hastie, Alex

AU - Lee, Joyce

AU - Selvaraj, Siddarth

AU - Green, Richard E.

AU - Putnam, Nicholas H.

AU - Gut, Ivo

AU - Ghurye, Jay

AU - Garrison, Erik

AU - Sims, Ying

AU - Collins, Joanna

AU - Pelan, Sarah

AU - Torrance, James

AU - Tracey, Alan

AU - Wood, Jonathan

AU - Dagnew, Robel E.

AU - Guan, Dengfeng

AU - London, Sarah E.

AU - Clayton, David F.

AU - Mello, Claudio V.

AU - Friedrich, Samantha R.

AU - Lovell, Peter V.

AU - Osipova, Ekaterina

AU - Al-Ajli, Farooq O.

AU - Secomandi, Simona

AU - Kim, Heebal

AU - Theofanopoulou, Constantina

AU - Hiller, Michael

AU - Zhou, Yang

AU - Harris, Robert S.

AU - Makova, Kateryna D.

AU - Medvedev, Paul

AU - Hoffman, Jinna

AU - Masterson, Patrick

AU - Clark, Karen

AU - Martin, Fergal

AU - Howe, Kevin

AU - Flicek, Paul

AU - Walenz, Brian P.

AU - Kwak, Woori

AU - Clawson, Hiram

AU - Diekhans, Mark

AU - Nassar, Luis

AU - Paten, Benedict

AU - Kraus, Robert H. S.

AU - Crawford, Andrew J.

AU - Gilbert, M. Thomas P.

AU - Zhang, Guojie

AU - Venkatesh, Byrappa

AU - Murphy, Robert W.

AU - Koepfli, Klaus-Peter

AU - Shapiro, Beth

AU - Johnson, Warren E.

AU - Di Palma, Federica

AU - Marques-Bonet, Tomas

AU - Teeling, Emma C.

AU - Warnow, Tandy

AU - Graves, Jennifer Marshall

AU - Ryder, Oliver A.

AU - Haussler, David

AU - O’Brien, Stephen J.

AU - Korlach, Jonas

AU - Lewin, Harris A.

AU - Howe, Kerstin

AU - Myers, Eugene W.

AU - Durbin, Richard

AU - Phillippy, Adam M.

AU - Jarvis, Erich D.

N1 - Funding Information: Acknowledgements We thank the following persons for feedback and support: R. Johnson, E. Karlsson, K. Lindblad Toh, W. Jun, I. Korf, W. Haerty, G. Etherington, B. Clavijo, and A. Komissarov for discussions in the early stages of the project; R. Fuller for help with the G10K website maintenance, and H. Segal for help with with VGP website development; M. Linh Pham for help with initial grant writing; L. Shalmiyev for administrative help; D. Church, G. Kol, K. Baruch, O. Barad, I. Liachko, E. Muzychenko, S. Garg, and M. Kolmogorov for preliminary analyses performed on one or more genomes; K. Oliver, C. Corton and J. Skelton for data generation; E. Harry for technical support in scaff10x and Pretext; C. Mazzoni for coordinating students and training at Leibniz Institute for Zoo and Wildlife Research and Berlin Center for Genomics in Biodiversity Research; and M. Driller, C. Caswara, M. Vafadar, N. Hill, D. De Panis, A. Whibley, B. Maloney, C. Mitchell, G. Gallo, J. Gaige, K. Amoako-Boadu, M. Jose Gomez, M. Montero, D. Ratnikov, S. Brown, S. Zylka, S. Marcus, and T. Carrasco for completing training and testing the VGP pipeline by producing ordinal representative genome assemblies not described in this manuscript. We thank our company partners (listed below), NCBI, EBI, and Amazon AWS, including AWS for sponsoring sequence storage. J. Fekecs and D. Leja created the animal images, and J. Kim modified them to silhouettes. We thank them for their permission to publish. A.R., S.K., B.P.W. and A.M.P. were supported by the Intramural Research Program of the NHGRI, NIH (1ZIAHG200398). A.R. was also supported by the Korea Health Technology R&D Project through KHIDI, funded by the Ministry of Health & Welfare, Republic of Korea (HI17C2098). S.A.M., I.B. and R.D. were supported by Wellcome Trust grant WT207492; W.C., M. Smith, Z.N., Y.S., J.C., S. Pelan, J.T., A.T., J.W. and Kerstin Howe by WT206194; L.H., F.M., Kevin Howe and P. Flicek by WT108749/Z/15/Z, WT218328/B/19/Z and the European Molecular Biology Laboratory. O.F. and E.D.J. were supported by Howard Hughes Medical Institute and Rockefeller University start-up funds for this project. J.D. and H.A.L. were supported by the Robert and Rosabel Osborne Endowment. M.U.-S. received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement (750747). F.T.-N., J. Hoffman, P. Masterson and K.C. were supported by the Intramural Research Program of the NLM, NIH. C.L., B.J.K., J. Kim and H.K. were supported by the Marine Biotechnology Program of KIMST, funded by the Ministry of Ocean and Fisheries, Republic of Korea (20180430). M.C. was supported by Sloan Research Fellowship (FG-2020-12932). S.C.V. was funded by a Max Planck Research Group award from the Max Planck Society, and a Human Frontiers Science Program (HFSP) Research grant (RGP0058/2016). T.M.L., W.E.J. and the Canada lynx genome were funded by the Maine Department of Inland Fisheries & Wildlife (F11AF01099), including when W.E.J. held a National Research Council Research Associateship Award at the Walter Reed Army Institute of Research (WRAIR). C.B. was supported by the NSF (1457541 and 1456612). D.B. was funded by The University of Queensland (HFSP - RGP0030/2015). D.I. was supported by Science Exchange Inc. (Palo Alto, CA). H.W.D. was supported by NSF grants (OPP-0132032 ICEFISH 2004 Cruise, PLR-1444167 and OPP-1955368) and the Marine Science Center at Northeastern University (416). G.J.P.N. and the thorny skate genome were funded by Lenfest Ocean Program (30884). M.P. was funded by the German Federal Ministry of Education and Research (01IS18026C). M. Malinsky was supported by an EMBO fellowship (ALTF 456-2016). The following authors’ contributions were supported by the NIH: S. Selvaraj (R44HG008118); C.V.M., S.R.F., P.V.L. (R21 DC014432/DC/NIDCD); K.D.M. Funding Information: (R01GM130691); H.C. (5U41HG002371-19); M.D. (U41HG007234); and B.P. (R01HG010485). D.G. was supported by the National Key Research and Development Program of China (2017YFC1201201, 2018YFC0910504 and 2017YFC0907503). F.O.A. was supported by Al-Gannas Qatari Society and The Cultural Village Foundation-Katara, Doha, State of Qatar and Monash University Malaysia. C.T. was supported by The Rockefeller University. M. Hiller was supported by the LOEWE-Centre for Translational Biodiversity Genomics (TBG) funded by the Hessen State Ministry of Higher Education, Research and the Arts (HMWK). H.C. was supported by the NHGRI (5U41HG002371-19). R.H.S.K. was funded by the Max Planck Society with computational resources at the bwUniCluster and BinAC funded by the Ministry of Science, Research and the Arts Baden-Württemberg and the Universities of the State of Baden-Württemberg, Germany (bwHPC-C5). B.V. was supported by the Biomedical Research Council of A*STAR, Singapore. T.M.-B. was funded by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (864203), MINECO/ FEDER, UE (BFU2017-86471-P), Unidad de Excelencia María de Maeztu, AEI (CEX2018-000792-M), a Howard Hughes International Early Career award, Obra Social “La Caixa” and Secretaria d’Universitats i Recerca and CERCA Programme del Departament d’Economia i Coneixement de la Generalitat de Catalunya (GRC 2017 SGR 880). E.C.T. was supported by the European Research Council (ERC-2012-StG311000) and an Irish Research Council Laureate Award. M.T.P.G. was supported by an ERC Consolidator Award 681396-Extinction Genomics, and a Danish National Research Foundation Center Grant (DNRF143). T.W. was supported by the NSF (1458652). J. M. Graves was supported by the Australian Research Council (CEO561477). E.W.M. was partially supported by the German Federal Ministry of Education and Research (01IS18026C). Complementary sequencing support for the Anna’s hummingbird and several genomes was provided by Pacific Biosciences, Bionano Genomics, Dovetail Genomics, Arima Genomics, Phase Genomics, 10X Genomics, NRGene, Oxford Nanopore Technologies, Illumina, and DNAnexus. All other sequencing and assembly were conducted at the Rockefeller University, Sanger Institute, and Max Planck Institute Dresden genome labs. Part of this work used the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih. gov). We acknowledge funding from the Wellcome Trust (108749/Z/15/Z) and the European Molecular Biology Laboratory. We thank Le Comité Scientifique Régional du Patrimoine Naturel and Direction de l’Environnement, de l’Aménagement et du Logement, Guyanne for research approvals and export permits. Publisher Copyright: © 2021, The Author(s).

PY - 2021

Y1 - 2021

N2 - High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

AB - High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

U2 - 10.1038/s41586-021-03451-0

DO - 10.1038/s41586-021-03451-0

M3 - Journal article

C2 - 33911273

AN - SCOPUS:85103951723

VL - 592

SP - 737

EP - 746

JO - Nature

JF - Nature

SN - 0028-0836

IS - 7856

ER -

ID: 262901571