摘要

One of the issues encountered in classification and regression is the processing inefficiency caused by a large number of input dimensions involved in the given training data set. Many dimensionality reduction approaches have been proposed to address this issue by reducing the number of input dimensions and maintaining the generalization capability of the original data set. However, less attention has been paid to regression than to classification. Besides, the computation with covariance matrices involved results in an inefficient reduction process in most existing methods. In this paper, we propose a machine learning based dimensionality reduction approach for regression problems. For a given set of training instances, a group of clusters are formed such that the instances included in the same cluster are similar to each other. Then one new feature is extracted from each cluster through a certain weighted combination of the training instances. Consequently, the dimensionality of the original data set is reduced. The clusters are created incrementally and automatically without the need of specifying the number of clusters in advance by the user. The characteristics of the original data set are substantially retained since all the original features are involved in the derivation of the extracted features. Also, the computation with covariance matrices is avoided, and thus efficiency is maintained. A number of experiments on real-world data sets are conducted to demonstrate the effectiveness of the proposed approach.