摘要

Quality control can effectively improve the quality of surface meteorological observations. To ensure the stability and effectiveness of a quality control model under different terrain and climate conditions, it is necessary to structure a quality control model with strong generalization ability. Algorithms such as the Random Forest algorithm provide such generalization ability. However, machine learning algorithms are slower than traditional mathematical models. Therefore, a Random Forest quality control algorithm based on the principal component analysis (PCA-RF) is proposed in this paper. Fifteen target stations under different climatic and geomorphological conditions were selected and tested using observations collected four times daily at neighboring stations from 2005-2014. The results show that using PCA to analyze the elemental composition and select elements with high correlation factors, as well as applying the Random Forest algorithm, can effectively reduce the run time and keep the accuracy of the model. The training sample dependence, model prediction accuracy and error detection rate of the PCA-RF model are superior to those of the Spatial Regression method. Therefore, the PCA-RF method is a better quality control model for the spatial quality control of multiple elements of surface air temperature observations.

全文