A group of researchers primarily from Texas A&M University’s department of Geography recently released a paper in the journal SoftwareX about a new software they created called NetCDFaster that is designed to extract and visualize NetCDF datasets much faster than previously possible.
The NetCDF format is commonly used for data storage in climatology, oceanography and related fields. The open format allows for complex, high dimensional datasets, which is helpful for storing real-world data on elements like temperature based on both time and location.
The group’s first author was Zhenlei Song, a graduate student who graduated last year and currently works for Esri, a geographic information software company focused on mapping and spatial analysis. Dr. Zhe Zhang is the study’s corresponding author.
The team identified that existing tools either optimize for user interface at the cost of performance, or data retrieval at the cost of user experience. The paper noted that “Researchers often face repetitive and time-consuming tasks, such as writing custom scripts to extract data subsets and then switching to separate tools for visualization.”
When testing NetCDFaster, the team found it was roughly equal to other processes on small datasets, and on the order of seven times faster than others at datasets above one gigabyte. For the initial release, the team optimized NetCDFaster for one-dimensional coordinate systems with moderate spatial resolution due to such data being the most common in publicly available NetCDF datasets.
This material is based on work funded by the National Science Foundation under Grant No. #2137684 #2019129 #2321069 #2339174 #2519476 #2526748. The project is supported by with the NSF NAIRR Pilot and CAREER projects, for both of which Zhang is the Principal Investigator.