摘要

Cost estimation is one of the most critical activities in software life cycle. In past decades, a number of techniques have been proposed for cost estimation. Linear regression is yet the most frequently applied method in the literature. However, a number of studies point out that linear regression is prone to low prediction accuracy. The low prediction accuracy is due to a number of reasons such as non-linearity and non-normality. One less addressed reason is the multi-collinearities which may lead to unstable regression coefficients. On the other hand, it has been reported that multi-collinearity spreads widely across the software engineering datasets. To tackle this problem and improve regression's accuracy, we propose a holistic problem-solving approach (named adaptive ridge regression system) integrating data transformation, multi-collinearity diagnosis, ridge regression technique and multi-objective optimization. The proposed system is tested on two real world datasets with the comparisons with OLS regression, step-wise regression and other machine learning methods. The results indicate that adaptive ridge regression system can significantly improve the performance of regressions on multi-collinear datasets and produce more explainable results than machine learning methods.

  • 出版日期2010-11