摘要

Rough set approach is one of effective feature selection methods that can preserve the meaning of the features. The essence of feature selection based on rough set approach is to find a subset of the original features (attributes) using rough set theory. So far, many feature selection (also called feature reduction) methods based on rough set have been proposed, where numerous experimental results have demonstrated that these methods based on discernibility matrix are concise and efficient, but have much higher space complexity. In order to reduce the storage space of the existing feature selection methods based on discernibility matrix, in this paper, we introduce a novel condensing tree structure (C-Tree), which is an extended order-tree, every non-empty element of a discernibility matrix is mapped to one path in the C-Tree and a lot of non-empty elements may share the same path or prefix, so the C-Tree has much lower space complexity as compared to discernibility matrix. Moreover, our feature selection algorithms employ the C-Tree structure and incorporate some heuristic strategies, hence efficiently reduce both space and computational complexities. Algorithms of this paper are experimented using some standard datasets and synthetic datasets for testing both time and space complexities. Experimental results show that the algorithms of this paper can efficiently reduce the cost of storage and be computationally inexpensive when compared to the existing algorithms based on discernibility matrix for feature reduction.