摘要

Recently, considerable attention has focused on compound sequence classification methods which integrate multiple data mining techniques. Among these methods, sequential pattern mining (SPM) based sequence classifiers are considered to be efficient for solving complex sequence classification problems. Although previous studies have demonstrated the strength of SPM-based sequence classification methods, the challenges of pattern redundancy, inappropriate sequence similarity measures, and hard-to-classify sequences remain unsolved. This paper proposes an efficient two-stage SPM-based sequence classification method to address these three problems. In the first stage, during the sequential pattern mining process, redundant sequential patterns are identified if the pattern is a sub-sequence of other sequential patterns. A list of compact sequential patterns is generated excluding redundant patterns and used as representative features for the second stage. In the second stage, a sequence similarity measurement is used to evaluate partial similarity between sequences and patterns. Finally, a particles warm optimization-AdaBoost (PSO-AB) sequence classifier is developed to improve sequence classification accuracy. In the PSO-AB sequence classifier, the PSO algorithm is used to optimize the weights in the individual sequence classifier, while the AdaBoost strategy is used to adaptively change the distribution of patterns that are hard to classify. The experiments show that the proposed two-stage SPM-based sequence classification method is efficient and superior to other approaches.

  • 出版日期2015-2