摘要

This paper describes the use of (supervised) data mining to predict casing corrosion in carbon geological storage projects. This study discusses: 1) data pre-processing such as missing value handling and discretisation; 2) feature selection methods such as correlation coefficient, signal-to-noise ratio, information gain, Gini index, and the k-nearest neighbour (KNN) approach; 3) classification techniques including decision trees (C4.5 and CART) and Bayesian networks; 4) evaluation methods like cross-validation as four successive steps of supervised learning. The experimental analysis of the casing corrosion problem based on the given supervised learning framework shows the effectiveness of data mining techniques in finding features relevant to the problem under study and in building models to predict and identify casing corrosion.

  • 出版日期2014