Johns Hopkins scientists reported that they have successfully used two separate gene technologies to assemble the most complete genome sequence to date of Triticum aestivum, the most common cultivated species of wheat used to make bread.
A report on the achievement was published in the Oct. 23 issue of GigaScience -- just a few weeks before a related report on the sequencing of bread wheat's "ancestor," Aegilops tauschii, was published Nov. 15 in Nature.
Together, the wheat genome sequences may help biologists not only better understand the evolutionary history of wheat but also advance the quest for hardier, more pest- and drought-resistant wheat types to help feed the world's growing population, the scientists said.
"After many years of trying, we've finally been able to produce a high-quality assembly of this very challenging genome," said Dr. Steven Salzberg, a Bloomberg distinguished professor of biomedical engineering at the Johns Hopkins University Whiting School of Engineering and the McKusick-Nathans Institute of Genetic Medicine at the Johns Hopkins University School of Medicine.
According to the scientists, bread wheat has one of the most complex genomes known to science, containing an estimated 16 billion base pairs of DNA and six copies of seven chromosomes. By comparison, the human genome is about five times smaller, with about 3 billion base pairs and two copies of 23 chromosomes. Previously published versions of the bread wheat genome have contained large gaps in its highly repetitive DNA sequence.
"The repetitive nature of this genome makes it difficult to fully sequence," Salzberg said. "It's like trying to put together a jigsaw puzzle of a landscape scene with a huge, blue sky. There are lots of very similar, small pieces to assemble."
The newly assembled bread wheat genome cost $300,000 for the sequencing alone, and it took a year for the Johns Hopkins researchers to assemble 1.5 trillion bases of raw data into a final assembly of 15.34 billion base pairs.
To do it, Salzberg and his team used two types of genome sequencing technology: high-throughput and nanopore sequencing. As its name implies, high-throughput sequencing generates massive amounts of DNA base pairs very quickly and cheaply, although the fragments are very short — just 150 base pairs long for this project. To help assemble the repetitive areas, the Johns Hopkins team used nanopore sequencing, which forces DNA through tiny pores with an electric current running through them. The technology enables scientists to read up to 20,000 base pairs at a time by measuring changes in the flow of the current as a strand of DNA passes through the pore.
Salzberg said sequencing a genome of this size requires not only genetic expertise but also very large computing resources, which are available at only relatively few research institutions around the world. The team relied heavily on the Maryland Advanced Research Computing Center, a computing center shared by Johns Hopkins and the University of Maryland, which has more than 20,000 computer cores (CPUs) and greater than 20 petabytes of data storage. The team used approximately 100 CPU years to put this genome together.
Salzberg and his team also participated in the collaborative effort reported in the journal Nature to sequence the Aegilops tauschii variety of wheat, which is commonly referred to as goatgrass and is still found in parts of Asia and Europe. Its genome is approximately one-third the size of the bread wheat genome but has similar levels of repetition.