摘要

Picking up appropriate classification algorithms for a given data set is very important and useful in practice. One of the most challenging issues for algorithm selection is how to characterize different data sets. Recently, we extracted the structural information of a data set to characterize itself. Although these kinds of characteristics work well in identifying similar data sets and recommending appropriate classification algorithms, the extraction method can only be applied to binary data sets and its performance is not high. Thus, in this paper, an improved data set characterization method is proposed to address these problems. For the purpose of evaluating the effectiveness of the improved method on algorithm recommendation, the unsupervised learning method EM is employed to build the algorithm recommendation model. Extensive experiments with 17 different types of classification algorithms are conducted upon 84 public UCI data sets; the results demonstrate the effectiveness of the proposed method.