USA
June 13, 2017
A large group of scientists published a new reference genome assembly for maize. It was generated with SMRT Sequencing and other technologies, and represents a major leap forward in accurately portraying and annotating the genome of this important crop.
“Improved maize reference genome with single-molecule technologies” comes from lead author Yinping Jiao, senior author Doreen Ware, and collaborators at Cold Spring Harbor Laboratory, the USDA ARS, and many other institutions. They embarked on the project because the existing reference for maize, based on Sanger technology and released in 2009, “is composed of more than 100,000 small contigs, many of which are arbitrarily ordered and oriented, markedly complicating detailed analysis of individual loci and impeding investigation of intergenic regions crucial to our understanding of phenotypic variation and genome evolution,” the authors explain. A higher-quality assembly would be extremely useful for crop breeding and selection programs as well as basic research.
The new reference is based on PacBio sequencing data, which led to a preliminary assembly with fewer than 3,000 contigs and a contig N50 of 1.2 Mb. Scientists then layered in data from an optical map, a BAC-based minimum tiling path, and a high-density genetic map. The end result: a high-quality 2 Gb assembly with just 2,522 gaps. “The new maize B73 reference genome has 240-fold higher contiguity than the recently published short-read genome assembly of maize cultivar PH207,” they report.
To assess the new assembly, the team compared it to the previous Sanger-based reference. That “revealed more than 99.9% sequence identity and a 52-fold increase in the mean contig length, with 84% of the BACs spanned by a single contig from the long reads assembly,” the authors write. ChIP-seq analysis showed that centromeres in the new assembly were mostly intact and correctly placed. The new assembly fixed many known mis-oriented regions in the reference genome, and an updated annotation consolidated gene models with the support of 111,000 full-length transcripts from SMRT Sequencing. “Our reference assembly also vastly improved the coverage of regulatory sequences, decreasing the number of genes exhibiting gaps in the 3-kb region(s) flanking coding sequence from 20% to <1%,” the team adds.
The scientists interrogated transposable elements, which are well known and important in the maize genome. The previous maize annotation had few intact representations of these elements; for long terminal repeat retrotransposon copies, not even 1% were complete. The team incorporated “a new homology-independent annotation pipeline” and uncovered 1.2 Gb of intact retrotransposons, about half of which were “nested retrotransposon copies disrupted by the insertion of other transposable elements,” they note. “Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize.” This information will contribute to a better understanding of the diversity and evolution of maize varieties.
In closing, the scientists write, “Our improved assembly of the B73 genome, generated using single-molecule technologies, demonstrates that additional assemblies of other maize inbred lines and similar high-quality assemblies of other repeat-rich and large-genome plants are feasible.”
For more information about maize genomics, check out our maize genome case study, this publication about gene copy number, and this transcriptome study.