A tweaked gene or two among the millions or even billions of proteins that make up an organism's DNA are often all that distinguish the drought-tolerant plant, better-performing animal or a person predisposed to cancer from their biological peers, Montana State University noted in an announcement.
That's why a better understanding of genetic variation within a species could, among other things, help improve selection of crops for local conditions and detection of disease, according to Joann Mudge, senior research scientist at the nonprofit National Center for Genome Resources (NCGR).
A generation ago, recording an organism's DNA from beginning to end was so laborious and expensive that scientists celebrated when they completed the task for a single bacterium, but as genome sequencing becomes faster and cheaper, scientists increasingly have access to insights about which genes do what, Mudge said.
"We're sequencing multiple individuals of some species," including plants and other complex organisms, Mudge said, and that allows scientists to begin to sort out which segments of DNA form a species' core genome and which correspond to traits shared by only some individuals.
That said, the growing field of pangenomics presents a major analytical challenge. That's why NCGR recently partnered with Montana State University computer scientists to develop software that can compare multiple genomes and make sense of the results. The project is backed by a three-year, $662,000 grant from the National Science Foundation.
"We've been very happy with the way it's working," said Brendan Mumey, professor in the Gianforte School of Computing in Montana State's Norm Asbjornson College of Engineering who is co-leading the project with Mudge.
According to Mumey, previously available software struggled with analyzing pangenomes for relatively primitive organisms such as the common yeast Saccharomyces cerevisiae, whose genome contains only 12 million of the DNA units known as base pairs. (By comparison, the human genome contains 3 billion base pairs.) Among the known strains of the yeast, minor genetic variations account for physical adaptations such as the ability of brewer's yeast to survive alcohol during the making of beer and wine.
"It's a classic 'big data' problem," Mumey said in reference to the field of computing that deals with exceptionally large and complex data sets.
Montana State assistant professor of computer science Indika Kahanda, a member of the research team, specializes in developing the machine learning models that help the new software adjust its gene-sorting analysis according to input from scientists. That approach has helped the team, which includes NCGR research scientist Thiru Ramaraj, identify genes of interest in a yeast pangenome that includes roughly 100 strains.
Mumey said the researchers' next step is to continue to refine the software so it can handle larger and more complex genomes, such as those of plants. The computational techniques being used "are still in their infancy," he said.
Eventually, pangenomics could help medical professionals diagnose a variety of diseases that have a genetic component, Mudge said.
The improved pangenomics tool is already helping scientists break out of comparing genomes to a single, arbitrary reference, Mudge said. Instead, researchers can represent a species' entire genome with all its nuance and variety.
"It's a hard problem to solve," Mudge said. "This has been a great collaboration."