摘要

Clusterwise regression aims to cluster data sets where the clusters are characterized by their specific regression coefficients in a linear regression model. In this paper, we propose a method for determining a partition which uses an idea of robust regression. We start with some random weighting to determine a start partition and continue in the spirit of M-estimators. The residuals for all regressions are used to assign the observations to the different groups. As target function we use the determination coefficient R-w(2) for the overall model. This coefficient is suitably defined for weighted regression.
Target functions for the clusterwise regression problem may have a large number of local optima that cannot be handled with optimization methods based on derivatives. The approach commonly employed to overcome this problem is to start several times from random partitions and then to improve the resulting partition. Because our procedure is very fast it can be used with many random starts. Eventually, the solution with the highest determination coefficient R-w(2) for the overall model is chosen. The performance of the method is investigated with the help of Monte Carlo simulations. It is also compared to the finite-mixture approach to clusterwise regression. A sequence of bootstrap tests is proposed to determine the number of clusters.

  • 出版日期2011-6