A Machine Learning Based Approach for Evaluating Clone Detection Tools for a Generalized and Accurate Precision

作者:Svajlenko Jeffrey; Roy Chanchal K
来源:International Journal of Software Engineering and Knowledge Engineering, 2016, 26(9-10): 1399-1429.
DOI:10.1142/S0218194016400106

摘要

<jats:p> An important measure of clone detection performance is precision. However, there has been a marked lack of research into methods for efficiently and accurately measuring the precision of a clone detection tool. Instead, tool authors simply validate a small random sample of the clones their tools detected in a subject software system. Since there could be many thousands of clones reported by the tool, such a small random sample cannot guarantee an accurate and generalized measure of the tool’s precision for all the varieties of clones that can occur in any arbitrary software system. In this paper, we propose a machine-learning-based approach that can cluster similar clones together, and which can be used to maximize the variety of clones examined when measuring precision, while significantly reducing the biases a specific subject system has on the generality of the precision measured. Our technique reduces the efforts in measuring precision, while doubling the variety of clones validated and reducing biases that harm the generality of the measure by up to an order of magnitude. Our case study with the NiCad clone detector and the Java class library shows that our approach is effective in efficiently measuring an accurate and generalized precision of a subject clone detection tool. </jats:p>

  • 出版日期2016-12