The Vertebrate Genomes Project: a new era of genome sequencing
16 new high-quality reference genomes from vertebrates are published, advancing comparative biology, conservation, and health research
The international Vertebrate Genomes Project (VGP) publishes today their flagship study focused on genome assembly quality and standardization for the field of genomics in a special issue of Nature, along with 20 associated publications. This study presents 16 diploid high-quality, near error-free, and near complete vertebrate reference genome assemblies that result from the five-year pilot phase of the VGP project. Understanding the DNA sequence of all vertebrates will enable the study of how genes have contributed to the evolution and survival of these species and it will also enable us to answer questions in health research. Genome data were primarily generated at three sequencing hubs that have invested in the mission of the VGP including the Rockefeller University Vertebrate Genome Lab, New York, USA (partly supported by the Howard Hughes Medical Institute), the Wellcome Sanger Institute, UK, and the Max Planck Institute of Molecular Cell Biology and Genetics in Dresden, Germany.
Growing out of the decade-old mission of the Genome 10K Community of Scientists (G10K) to sequence the genomes of 10,000 vertebrate species and other comparative genomics efforts, the goal of the VGP is to generate near error-free reference genome assemblies of all 72,000 extant vertebrate species. Reference genome assemblies provide a map of a species’ DNA sequence and its spatial context, that is, where along the chromosomes a specific piece of DNA sequence can be found. With its ambitious mission the VGP aims to address fundamental questions in biology, conservation, and disease including identifying species most genetically at risk for extinction and preserving their genetic information for future generations. The high-quality VGP genomes will become the main references for their species and will be stored in the Genome Ark, a digital open-access library of genomes.
Costs of Genome Sequencing
In the past, the generation of reference assemblies was expensive and labor-intensive so that they were only produced for human and the most important model organisms, while still containing gaps and errors. However, for a complete understanding of evolutionary processes and other fundamental questions in biology, high-quality reference genome assemblies of all species are required. Adam Phillippy, chair of the VGP genome assembly and informatics working group of over 100 members and head of the Genome Informatics Section of the National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA, says: “Completing the first vertebrate reference genome, human, took over 10 years and $3 billion dollars. Thanks to continued research and investment in DNA sequencing technology over the past 20 years, we can now repeat this amazing feat multiple times per day for just a few thousand dollars per genome.”
Contribution of Max Planck Institutes to the VGP
One of the sequencing hubs is the Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) in Dresden. Gene Myers, lead of the VGP sequencing hub at the MPI-CBG and the Center for Systems Biology Dresden (CSBD) says: “The VGP project is at the vanguard of the creation of a genomic catalog in analogy with Linnaeus‘ classification of life. I and my colleagues in Dresden are excited to be contributing such superb genome reconstructions with the financial support of the Max-Planck Society of Germany.” The Dresden scientists are part of the DRESDEN-concept Genome Center (DcGC) and have special expertise in the use of various “long-read” sequencing technologies. Longer pieces of sequence are important because they can resolve and span complex and repetitive parts in the genome, allowing clear assignments. The Dresden hub has contributed to three genomes of the 16 released genomes: the greater horseshoe bat (Rhinolophus ferrumequinum), the flier cichlid fish (Archocentrus centrarchus), and the genome pale spear nosed bat (Phyllostomus discolor). The first two genomes were wholly the work of the Dresden hub, and for P. discolor, the genome data was produced at the Rockefeller hub and the assembly and transcriptomic data were produced by Dresden.
The bat tissue samples were provided by members of the Bat1K consortium led by Sonja Vernes, Max Planck Institute for Psycholinguistics, NL and the University of St Andrews, UK and Emma Teeling, University College Dublin. The Bat1K consortium has the goal of sequencing all living bat species and has been instrumental in sequencing and analyzing bat genomes in collaboration and as a partner of the overall VGP project. The flier cichlid tissue samples came from Axel Meyer, University of Konstanz, Germany.
Robert Kraus from the Max Planck Institute of Animal Behavior was an early contributor to generating ideas and vision to the mission of VGP and pushed focus onto fewer genomes but higher quality per genome. Robert Kraus also sequenced several bird species to study the basis of avian influenza immune responses. These genomes are still in the background of VGP and will be part of the next releases.
The excellent quality of these genome assemblies enables novel discoveries at unprecedented scale with implications for characterizing the biodiversity of all life, species conservation, and human health and disease. The first high-quality reference genomes of six bat species, generated by the Bat 1K consortium were published in July 2020 in Nature and revealed selection and loss of immunity-related genes that may underlie bats’ unique tolerance to viral infection, providing novel avenues of research to increase survivability, particularly relevant for emerging infectious diseases, such as the current COVID-19 pandemic.
Sonja Vernes, a founding director of the Bat1K consortium and UKRI Future Leaders Fellow said: “These new genomes are a huge step towards answering key questions about biology and evolution across vertebrates. We can already see exciting new features of chromosomal evolution, including changes found in the six bat species provided by the Bat1K consortium that may contribute to enhanced immune systems and tolerance to pathogens. In the future these genomes will also help us understand complex behaviors like the evolution of animal communication and how human speech and language evolved.”
Specific to conservation, analyses of the VGP genomes for the kākāpō, a flightless parrot endemic to New Zealand, and the vaquita, a small porpoise and the most endangered marine mammal endemic to the Gulf of Mexico, imply evolutionary and demographic histories of purging harmful mutations in the wild and long-term small population size at genetic equilibrium.
A new era in genome science through collaboration
This massive comparative genomics project represents a new era of innovation in genome science, developing and using novel pipelines for state-of-the-art and consistent sequencing, assembly, and annotation techniques, with implications for addressing fundamental questions in comparative biology, genetics, biodiversity, conservation and health. It also serves as a model of scientific cooperation for other large-scale genomic projects based on the extensive infrastructure, collaboration and leadership of the VGP involving hundreds of international scientists working together from more than 50 institutions in 12 different countries since the VGP was initiated in 2016.
As a next step, the VGP will continue to work collaboratively across the globe and with other consortia to complete Phase 1 of the project whose goal is to sequence one representative species of the 260 orders of vertebrates. Technological advances, improved computational methods and the ever-decreasing cost of sequencing have enabled the VGP to pursue the ambitious goal of producing a reference genome assembly for each of the extant vertebrate species on earth. In the first phase of the project, the VGP has been focused on testing and improving genome sequencing and assembly approaches, on assembling a first set of 260 high-quality genomes of species representing all vertebrate orders. Phase 2 will focus on representative species from each vertebrate family and is currently in the process of sample identification and fundraising. The VGP has an open-door policy and welcomes others to join its efforts, ranging from fundraising, sample collection, and generating genome assemblies, or including their own genome assemblies that meet the VGP metrics as part of our overall mission.
All sequence data and assemblies are being made freely available as they are being produced and can be downloaded or browsed at GenomeArk (https://vgp.github.io/genomeark/), Genbank (https://www.ncbi.nlm.nih.gov/bioproject/489243), Ensembl (https://projects.ensembl.org/vgp/), and UCSC (https://hgdownload.soe.ucsc.edu/hubs/VGP/).
Full list of the 16 genomes:
pale spear-nosed bat (Phyllostomus discolor)
greater horseshoe bat (Rhinolophus ferrumequinum)
Canada lynx (Lynx canadensis)
platypus (Ornithorhynchus anatinus)
zebra finch (Taeniopygia guttata)
Kākāpō (Strigops habroptilus)
Anna’s hummingbird (Calypte anna)
Goode’s thornscrub tortoise (Gopherus evgoodei)
two-lined caecilian (Rhinatrema bivittatum)
zig-zag eel (Mastacembelus armatus)
climbing perch (Anabas testudineus)
flier cichlid (Archocentrus centrarchus)
eastern happy cichlid (Astatotilapia calliptera)
channel bull blenny (Cottoperca gobio)
blunt-snouted clingfish (Gouania willdenowi)
thorny skate (Amblyraja radiata)
The Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) in Dresden, Germany is one of more than 80 institutes of the Max Planck Society, an independent, non-profit organization in Germany. 500 curiosity-driven scientists from over 50 countries ask: How do cells form tissues? The basic research programs of the MPI-CBG span multiple scales of magnitude, from molecular assemblies to organelles, cells, tissues, organs, and organisms. www.mpi-cbg.de, Twitter: @mpicbg
The Rockefeller University
The Rockefeller University is one of the world’s leading biomedical research universities and is dedicated to conducting innovative, high-quality research to improve the understanding of life for the benefit of humanity. The university’s 70 laboratories conduct research in neuroscience, immunology, biochemistry, genomics, and many other areas. A community of 2,000 faculty, students, postdocs, technicians, clinicians, and administrative personnel work on our 16-acre Manhattan campus. Our unique approach to science has led to some of the world’s most revolutionary and transformative contributions to biology and medicine. During Rockefeller’s 120-year history, our scientists have won 26 Nobel Prizes, 24 Albert Lasker Medical Research Awards, and 20 National Medals of Science.
The Wellcome Sanger Institute
The Wellcome Sanger Institute is a world leading genomics research centre. We undertake large-scale research that forms the foundations of knowledge in biology and medicine. We are open and collaborative; our data, results, tools and technologies are shared across the globe to advance science. Our ambition is vast – we take on projects that are not possible anywhere else. We use the power of genome sequencing to understand and harness the information in DNA. Funded by Wellcome, we have the freedom and support to push the boundaries of genomics. Our findings are used to improve health and to understand life on Earth. Find out more at www.sanger.ac.uk or follow us on Twitter, Facebook, LinkedIn and on our Blog.
About the DRESDEN-concept Genome Center (DCGC)
The DcGC is a joint sequencing center between the Technische Universität Dresden and the MPI-CBG. It is one of four DFG-funded German competence centers for next generation sequencing. The cooperative project combines next generation sequencing and genomics experts of the TU Dresden and MPI-CBG, as well as of the Center for Systems Biology Dresden (CSBD) and the Center for Regenerative Therapies Dresden (CRTD) at the TU Dresden. The center consists of three platforms focusing on long read sequencing technologies, single cell, and short read sequencing, and is covering the major state of the art next generation sequencing devices. The long read platform runs several PacBio SEQUEL systems, Oxford Nanopore PromethION sequencers, a Bionano Genomics Saphyr optical mapping device, and offers different Chromatin confirmation capture protocols. We have longstanding experience in generating and assembling the highest quality genomes from complex organisms. We routinely team up with scientists and experts to get the best genome data out of challenging samples. Our bioinformatics and computing experts work on assemblies of all kinds of genomes covering also very large and complex genomes that reach beyond the normal scale. If you want to learn more about DcGC, please visit https://genomecenter.tu-dresden.de/about-us and follow us on Twitter @DcGenomeCenter
The Vertebrate Genome Laboratory at the Rockefeller University
The Vertebrate Genome Laboratory (VGL) at the Rockefeller University specializes in long-read genomic technologies. The VGL is one of the three VGP sequencing hubs. It is equipped with cutting-edge genomic technologies including several Pacific Biosciences and Oxford Nanopore sequencers, a Bionano Genomics Saphyr optical mapper, and a 10x Genomics Chromium microfluidics instrument. Composed of a team of experts in long reads and ultra-High-Molecular Weight DNA, the VGL strives to find a way to decipher life’s blueprint from any samples, even the most challenging ones. Using state of the art technologies and extensive international collaborations, we are devoted to fill the gap between field scientists and geneticists. We are particularly proud to play our small part in the effort of reversing species extinction by sequencing genomes of endangered species before it is too late. Learn more about us at http://vertebrategenomelab.org and follow us on twitter at @genomewarriors
Howard Hughes Medical Institute
The Howard Hughes Medical Institute plays an important role in advancing scientific research and education in the United States. Its scientists, located across the country and around the world, have made important discoveries that advance both human health and our fundamental understanding of biology. The Institute also aims to transform science education into a creative, interdisciplinary endeavor that reflects the excitement of real research. HHMI’s headquarters are located in Chevy Chase, Maryland, just outside Washington, DC.
Contact for scientific information:
Professor Gene Myers
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
Center for Systems Biology Dresden, Deutschland