摘要

An important research area of Spectrum-Based Fault Localization (SBFL) is the effectiveness of risk evaluation formulas. Most previous studies have adopted an empirical approach, which can hardly be considered as sufficiently comprehensive because of the huge number of combinations of various factors in SBFL. Though some studies aimed at overcoming the limitations of the empirical approach, none of them has provided a completely satisfactory solution. Therefore, we provide a theoretical investigation on the effectiveness of risk evaluation formulas. We define two types of relations between formulas, namely, equivalent and better. To identify the relations between formulas, we develop an innovative framework for the theoretical investigation. Our framework is based on the concept that the determinant for the effectiveness of a formula is the number of statements with risk values higher than the risk value of the faulty statement. We group all program statements into three disjoint sets with risk values higher than, equal to, and lower than the risk value of the faulty statement, respectively. For different formulas, the sizes of their sets are compared using the notion of subset. We use this framework to identify the maximal formulas which should be the only formulas to be used in SBFL.