A novel kernel to predict software defectiveness

作者:Okutan Ahmet*; Yildiz Olcay Taner
来源:Journal of Systems and Software, 2016, 119: 109-121.
DOI:10.1016/j.jss.2016.06.006

摘要

Although the software defect prediction problem has been researched for a long time, the results achieved are not so bright. In this paper, we propose to use novel kernels for defect prediction that are based on the plagiarized source code, software clones and textual similarity. We generate precomputed kernel matrices and compare their performance on different data sets to model the relationship between source code similarity and defectiveness. Each value in a kernel matrix shows how much parallelism exists between the corresponding files of a software system chosen. Our experiments on 10 real world datasets indicate that support vector machines (SVM) with a precomputed kernel matrix performs better than the SVM with the usual linear kernel in terms of F-measure. Similarly, when used with a precomputed kernel, the k-nearest neighbor classifier (KNN) achieves comparable performance with respect to KNN classifier. The results from this preliminary study indicate that source code similarity can be used to predict defect proneness.

  • 出版日期2016-9