摘要

With the increase in size and complexity of spatiotemporal data, traditional methods for performing statistical analysis are insufficient for meeting real-time requirements for mining information from Big Data, due to both data- and computing-intensive factors. To solve the Big Data challenges in geostatistics and to support decision-making, a high performance, spatiotemporal statistical analysis system (Geostatistics-Hadoop) is proposed in this paper. The proposed system has several features: (1) Hadoop is enhanced to handle spatial data in a native format and execute a number of parallelized spatial analysis algorithms to solve practical geospatial analysis problems; (2) the Oozie-based workflow system is utilized to ease the operation and sharing of spatial analysis services; and (3) a private cloud platform based on Eucalyptus is leveraged to provide on-the-fly and elastic computing resources. Experimental results show that Geostatistics-Hadoop efficiently conducts rapid information mining and analysis of big spatiotemporal data sets, with the support of elastic computing resources from a cloud platform. The adoption of cloud computing and the Hadoop cluster to parallelize statistical calculations significantly improves the performance of Big Data analyses.