摘要

Multi-label classification is an active research field in machine learning. Because of the high dimensionality of multi-label data, attribute reduction (also known as feature selection) is often necessary to improve multi-label classification performance. Rough set theory has been widely used for attribute reduction with much success. However, little work has been done on applying rough set theory to attribute reduction in multi-label classification. In this paper, a novel attribute reduction method based on rough set theory is proposed for multi-label data. First, the uncertainties conveyed by labels are analyzed, and a new type of attribute reduct is introduced, called complementary decision reduct. The relationships between complementary decision reduct and two representative types of attribute reducts are also investigated, showing significant advantages of complementary decision reduct in revealing the uncertainties implied in multi-label data. Second, a discernibility matrix-based approach is introduced for computing all complementary decision reducts, and a heuristic algorithm is proposed for effectively computing a single complementary decision reduct. Experiments on real-life data demonstrate that the proposed approach can effectively reduce unnecessary attributes and improve multi-label classification accuracy.