摘要

Software product quality can be enhanced significantly if we have a good knowledge and understanding of the potential faults therein. This paper describes a study to build predictive models to identify parts of the software that have high probability of occurrence of fault. We have considered the effect of thresholds of object-oriented metrics on fault proneness and built predictive models based on the threshold values of the metrics used. Prediction of fault prone classes in earlier phases of software development life cycle will help software developers in allocating the resources efficiently. In this paper, we have used a statistical model derived from logistic regression to calculate the threshold values of object oriented, Chidamber and Kemerer metrics. Thresholds help developers to alarm the classes that fall outside a specified risk level. In this way, using the threshold values, we can divide the classes into two levels of risk - low risk and high risk. We have shown threshold effects at various risk levels and validated the use of these thresholds on a public domain, proprietary dataset, KC1 obtained from NASA and two open source, Promise datasets, IVY and JEdit using various machine learning methods and data mining classifiers. Interproject validation has also been carried out on three different open source datasets, Ant and Tomcat and Sakura. This will provide practitioners and researchers with well formed theories and generalised results. The results concluded that the proposed threshold methodology works well for the projects of similar nature or having similar characteristics.

  • 出版日期2015-4