摘要

This paper presents a new decision tree algorithm for multi-valued and multilabeled data. In the algorithm, a new measuring formula calculating the similarity between two label-sets in the child nodes is firstly proposed. It comprehensively considers both the condition which the elements appear or do not appear in the two label-sets at the same time and the boundary condition. Use the coefficiently to weaken the proportion in which the elements do not appear at the same time so as to make the similarity calculations in the label-sets be more comprehensive and accurate. Moreover, we propose the new conditions of the corresponding node to stop splitting as well as the corresponding prediction method. The experiment compared with the existing algorithms proves that this algorithm has the higher accuracy and is more suitable for multi-valued attribute and multi-labeled data classification.

  • 出版日期2011

全文