Massive amounts of agricultural data are being collected around the world by scientists utilizing drones for remote sensing, agronomic yield measurements or plant breeding variety trials, but Dr. Seth Murray, Texas A&M AgriLife Research corn breeder at College Station, Texas, is calling on the agricultural scientific community to address the lack of plans or infrastructure to catalog and make these data accessible for future research and synthesis.
“We have this huge emerging problem centered on big data,” he said in speaking at the recent Crop Science Society of America conference in San Antonio, Texas.
Murray said big data permeates all facets of agricultural research and agricultural production. When properly curated, the data can be used beyond initial experiments to provide inexpensive but valuable opportunities to improve breeding and management knowledge from experiments that were never planned.
“Not a lot of people are thinking about this, but it will be a real pain to try to go back and find data later, so we need to address it now,” he said. “Right now, people generate data, keep it on their local computer, write a paper and then are supposed to put the data somewhere.”
Where that “somewhere” is remains unclear in many cases, and no organized effort is being made to anticipate it, Murray said.
“Think of the world of genomics,” he said. “The data is accessible through the National Center of Biotechnology Information (NCBI), where you can access nearly any species’ genomic data in this world and search across species.
“I think we should really learn from the genomics field; it’s the gold standard in publicly pooling data,” Murray said. “I think history is very useful; we can learn from other people’s mistakes. When you look at the genomics data explosion, one big lesson was to get reasonably prepared for the amount collected.”
He added, “Our ultimate goal is to advance the conversation among our agricultural science partners to create a system conducive to data sharing.”
Murray said his research program is using unoccupied aerial vehicles, or unmanned aircraft system (UAS) drones, to collect remote sensing data throughout the growing season and then turn the data into actionable information for plant breeding through the "FAIR" data principles: findable, accessible, interoperable and reusable.
“There are a lot of things in the world of genomics that hit all the FAIR standards and allow people to do these really complicated cool analysis on the origins of life based on data a corn breeder may have collected, but there’s nothing like this when it comes to remote sensing data. There’s nothing like this when it comes to plant breeding data,” he said. “We don’t have these repositories like NCBI to store and communicate data.”
Think data use and reuse with drones, he said. Data collection is the easiest part of the process. The harder part is all the further work with the data – stitching together images like the panorama feature on cameras, making 3D point clouds, extracting the data for each plot – before exporting it from GIS [geographic information system] software to statistical software, where it is usable by most researchers.
“We need to save it all, because in 10 years, the algorithms will improve, and we can do a better job of extracting useful data. If you don’t save your data from past years, obviously you can never get it back,” Murray said.
He told his fellow scientists that there is a need for and anticipated benefit of developing data sharing standards, incentivizing researchers to share data and building a data sharing infrastructure within agricultural research.
Those needs were identified in a recent Council for Agricultural Science & Technology (CAST) publication on which Murray participated. The authors presented the factors contributing to the current system of agricultural research that fosters ambivalence toward data sharing. They also described the advantages and shortcomings of emerging data sharing platforms, networks and repositories intended to facilitate data sharing in agriculture.
Murray said the authors realize that the impact from their effort also requires research in food production to pursue larger efforts integrating social, economic and environmental components rather than just the smaller-scale, individual-effort studies that are often funded and emphasized.
“Our ultimate goal is to advance the conversation among our agricultural science partners to create a system conducive to data sharing and team science needed to address the complex, grand-challenge questions in food systems,” he said.
Murray said he is a part of the “Genomes to Fields” program, which involves 35 professors across the country growing the same genetic populations of corn and collecting the data in exactly the same way.
“We’ve evaluated over 180,000 plots as a team now -- 2,500 hybrids in 162 environments,” he said. “How do you deal with that much data? First, hire a program coordinator and agree upon standards, because there will be multiple terabytes of data to be dealt with very quickly.”
As an individual, Murray said there is no way he could afford to collect all those data, but with coordination, it is available and can be used in anyone’s studies on how to improve corn. This is especially true with UAS or drones that automate routine measurements such as plant height to estimate grain yield or diseases, signatures of excellence to identify the most promising varieties in the field or to identify stress signatures for farmer management -- “but only if we know where that data is stored,” he said. “We recently made a season of UAS data and associated metadata public using the FAIR data principles. Our UAS work included 40 faculty in six colleges here at Texas A&M. How do you share and communicate that data to make sure it doesn’t get lost?”
Murray said scientists can still synthesize knowledge in journal articles but suggested that it might not be the best use of time for solving problems.
“The best use of our time might be making those data sets publicly available so people can do better and bigger studies to find the most important principles to improve agriculture, not just looking at a single environment,” he said.
This will mean collaborating with a data scientist from the beginning. It will require focus as a need and not as an afterthought, and it will take money, Murray said.
“It will mean taking part of your research budget and putting it into communicating your data and maybe hiring some people who can use some of the old data to make decisions. People are synthesizing data that was never designed for the experiment it was collected in. These synthesis studies will make the most reliable and impactful findings going forward,” he said.