摘要

Feature selection from imbalance data plays an important role in building efficient support decision systems, improving the machine learning process performance and enhancing the classification accuracy. The problem of feature selection becomes even more difficult with imbalance data, which occurs in real-world domains when the classes representing the data set are not equally distributed. Using the traditional classifiers to seek an accurate performance over a full range of instances is not suitable to deal with imbalanced learning tasks, since they tend to classify all the data into one class. In this paper, the Mahalanobis genetic algorithm(MGA) classifier is proposed to address the problem of feature selection for imbalance welding data. The MGA classifier was benchmarked with the Mahalanobis-Taguchi system (MTS) classifier, in terms of the following metrics: the total misclassification errors, the area under the curve (AUC) for receiver operating characteristic (ROC) curves, and the signal-to-noise (S/N) ratio. A real-life data set from the spot welding process was used as a pilot study. The results in terms of the total misclassification error and the AUC metrics showed that the MGA had better classification performance than MTS. Very close results were obtained when the training data set was balanced by using the Synthetic Minority Oversampling Technique (SMOTE) which indicates the suitability of the MGA and MTS classifiers to be used for the imbalance data set without using any preprocessor approach. Regarding the S/N ratio, the results were inconsistent with the other classification metrics, which raises the question about its credibility.

  • 出版日期2015-3