Automated triaging of very large bug repositories

作者:Banerjee Sean*; Syed Zahid; Helmick Jordan; Culp Mark; Ryan Kenneth; Cukic Bojan
来源:Information and Software Technology, 2017, 89: 1-13.
DOI:10.1016/j.infsof.2016.09.006

摘要

Context: Bug tracking systems play an important role in software maintenance. They allow both developers and users to submit problem reports on observed failures. However, by allowing anyone to submit problem reports, it is likely that more than one reporter will report on the same issue. Research in open source repositories has focused on two broad areas: determining the original report associated with each known duplicate, and assigning a developer to fix a particular problem. Objective: Limited research has been done in developing a fully automated triager, one that can first ascertain if a problem report is original or duplicate, and then provide a list of 20 potential matches for a duplicate report. We address this limitation by developing an automated triaging system that can be used to assist human triagers in bug tracking systems. Method: Our automated triaging system automatically assigns a label of original or duplicate to each incoming problem report, and provides a list of 20 suggestions for reports classified as duplicate. The system uses 24 document similarity measures and associated summary statistics, along with a suite of document property and user metrics. We perform our research on a lifetime of problem reports from the Eclipse, Firefox and Open Office repositories. Results: Our system can be used as a filtration aide, with high original recall exceeding 95% and low duplicate recall, or as a triaging guide, with balanced recall of approximately 70% for both originals and duplicates. Furthermore, the system reduces the workload on the triager by over 90%. Conclusions: Our work represents the first full scale effort at automatically triaging problem reports in open source repositories. By utilizing multiple similarity measures, we reduce the potential of false matches caused by the diversity of human language.

  • 出版日期2017-9