摘要

With the development of computer networks, it has become easy to have huge databases. Accordingly, it is becoming difficult for users to extract knowledge from such databases. In this paper we focus on data mining, especially classification. In real-world data mining, the missing value problem occurs in cases such as speech containing noise, facial occlusion, and the like. When a test sample has missing values, a classification system cannot handle them. In previous studies, various imputation methods have been developed, with the objective of solving the missing value problem with numerous explanatory variables, even if some explanatory variables were ineffective for imputation. It has been said that the use of many variables degrades learning efficiency, and thus we believe that imputation methods should be developed considering the relations among explanatory variables. It is also effective to consider the relations between the test sample and each of the training samples. Therefore, we have proposed an imputation method using a Bayesian network with weighted learning. Experiments have confirmed that the proposed method imputes missed values with approximate values, and the classification system successfully classified test samples in which missed values were imputed by the proposed method, with better success than some conventional imputation methods.

  • 出版日期2012-12