摘要

The Artificial Neural Network (ANN) is a powerful data-driven model that can capture and represent both linear and non-linear relationships between input and output data. Hence, ANNs have been widely used for the prediction and forecasting of water quality variables, to treat the uncertainty of contaminant source, and nonlinearity of water quality data. However, the initial weight parameter problem and imbalanced training data set make it difficult to assess the optimality of the results obtained, and impede the performance of ANN modeling. This study attempted to employ the ensemble modeling technique to estimate the performance of the ANN without the influence of initial weight parameters on the model results, and to apply several clustering methods, to alleviate the imbalance of the training data set. An ANN ensemble model was developed, and applied to forecast the water quality variables, pH, DO, turbidity (Turb), TN, and TP, at Sangdong station, on the Nakdong River. The optimal ANN models for each water quality variable could be selected from the ensemble modeling. The optimal ANN models for pH, DO, TN, and TP, of which the training target data set was distributed evenly, showed good results, with R squared higher than 0.90. But the ANN model for Turb, of which the training data set was imbalanced, showed large RMSE (11.8 NTU), and low R squared (0.58). The training data set of Turb was partitioned into several classes, by conjunctive clustering methods according to the patterns of data set for each number of clusters. The ANN ensemble models for Turb with the clustered training data set (clustered ANN models) were then developed. All clustered ANN models for Turb showed better results, than the model without clustering. In particular, the three-clustered ANN model showed an increase of R squared from 0.58 to 0.88, and a decrease of total RMSE from 11.8 NTU to 6.3 NTU.

  • 出版日期2015-9