摘要

Traditionally, data mining tasks such as classification and clustering are performed on data warehouses. Usually, updates are collected and applied to the data warehouse frequent time periods. For this reason, all patterns derived from the data warehouse have to be updated frequently as well. Due to the very large volumes of data, it is highly desirable to perform these updates incrementally. This study proposes a new incremental genetic algorithm for classification for efficiently handling new transactions. It presents the comparison results of traditional genetic algorithm and incremental genetic algorithm for classification. Experimental results show that our incremental genetic algorithm considerably decreases the time needed for training to construct a new classifier with the new dataset. This study also includes the sensitivity analysis of the incremental genetic algorithm parameters such as crossover probability, mutation probability, elitism and population size. In this analysis, many specific models were created using the same training dataset but with different parameter values, and then the performances of the models were compared.

  • 出版日期2011-3