The National Science Foundation (NSF) has awarded a $3.2 million project to a multidisciplinary team of researchers working to standardize how Earth science data is described, allowing for scientific data search engines that not only support discoverability but also facilitate data usage.
Shuang Zhang, assistant professor in the Department of Oceanography in Texas A&M University’s College of Arts and Sciences, is a co-principal investigator for the Democratized Cyberinfrastructure for Open Discovery to Enable Research (DeCODER) project, which began Oct. 1. Kenton McHenry, National Center for Supercomputing Applications (NCSA) associate director for software, is leading the project, which will expand and extend NCSA’s EarthCube GeoCODES framework and community to unify data and tool description and reuse across geoscience domains.
“The internet works because of defined standards and protocols (e.g., TCP/IP, HTTP, HTML),” McHenry said. “This allows software, which must be sustained, to change and evolve over time, with better software with new features to emerge, while still allowing everything to just work from the user perspective. That’s what we are doing here for research data through the adoption of science-on-schema.”
The DeCODER project is a collaborative research effort between NCSA, the San Diego Supercomputer Center (SDSC), Scripps Institution of Oceanography, Virginia Tech, Syracuse University, Texas A&M University and the University of California, Berkeley.
“This effort will assist researchers seeking to reuse data and bridge subdomains, especially in the Earth sciences,” said Christine Kirkpatrick, division director of research data services at SDSC.
The project will leverage the DeCODER platform to enable similar activities and outcomes across scientific communities, such as ecological forecasting, deep ocean science and low-temperature geochemical science.
“The past several decades have seen a proliferation in the amount of data documenting Earth’s low-temperature surface processes, such as global carbon cycling through the river-land-atmosphere system and the interplay between anthropogenic footprints and environmental feedbacks,” said Texas A&M’s Zhang. “Coupling data science techniques with these datasets is helping reveal the intrinsic patterns of nature’s low temperature processes that are sometimes extremely difficult to be captured by classical physical process models.”
However, due to the inherent complexity of Earth’s surface processes, Zhang said the datasets documenting them typically originate from a wide range of disciplines and deposition locations and also vary in size and format, which hinders data-driven discoveries.
“The DeCODER project will help the community of low-temperature geochemistry to build an online searching framework to retrieve the high-dimensional datasets in a more streamlined and efficient way,” he said. “Part of the outcome of DeCODER is expected to greatly push forward the fundamental research in using data to delineate Earth’s surface processes and patterns both on the regional and global scale.”
Tao Wen, an assistant professor in Earth and Environmental Sciences at Syracuse University, believes the DeCODER project will help make large and diverse datasets more accessible for those working in geochemistry and other fields of research.
“In this big-data era, the geoscience subfield of low-temperature geochemistry is falling behind in making research datasets findable, accessible, interoperable and reusable to the geoscience communities,” Wen said. “This is at least partially due to the extremely large variety in the size and scale of datasets being used by low-temperature geochemists to advance their understanding of the geochemical processes in terrestrial Earth’s surface systems.”
Virginia Tech researchers will be working to advance the discoverability of ecological forecasts through the development of protocols and software to archive and document model predictions of ecological dynamics. For example, if a researcher searches, “find forecasts of algae in lakes across the U.S.,” the search could yield current forecasts to help guide decision making and support environmental management.
The team also will tackle deep ocean science. “To understand and address global deep-sea challenges, we must find and leverage data from across national and international data facilities and programs,” said Karen Stocks, director of the Geological Data Center at Scripps Institution of Oceanography.
The researchers said they will also continue to support the scientific community by providing DeCODER as an open-source resource that can be customized by a given scientific community to create lightweight scientific gateways that bring together relevant distributed resources.
Read more at NCSA.
About Research At Texas A&M University
As one of the world’s leading research institutions, Texas A&M is at the forefront in making significant contributions to scholarship and discovery, including in science and technology. Research conducted at Texas A&M generated annual expenditures of more than $1.148 billion in fiscal year 2021. Texas A&M ranked 14th in the most recent National Science Foundation’s Higher Education Research and Development Survey based on expenditures of more than $1.131 billion in fiscal year 2020. Texas A&M’s research creates new knowledge that provides basic, fundamental and applied contributions resulting, in many cases, in economic benefits to the state, nation and world. To learn more, visit Research @ Texas A&M.