RAPT: Rare Class Prediction in Absence of True Labels

作者:Mithal Varun*; Nayak Guruprasad; Khandelwal Ankush; Kumar Vipin; Oza Nikunj C; Nemani Ramakrishna
来源:IEEE Transactions on Knowledge and Data Engineering, 2017, 29(11): 2484-2497.
DOI:10.1109/TKDE.2017.2739739

摘要

Many real-world problems involve learning models for rare classes in situations where there are no gold standard labels for training samples but imperfect labels are available for all instances. In this paper, we present RAPT, a three step predictive modeling framework for classifying rare class in such problem settings. The first step of the proposed framework learns a classifier that jointly optimizes precision and recall by only using imperfectly labeled training samples. We also show that, under certain assumptions on the imperfect labels, the quality of this classifier is almost as good as the one constructed using perfect labels. The second and third steps of the framework make use of the fact that imperfect labels are available for all instances to further improve the precision and recall of the rare class. We evaluate the RAPT framework on two real-world applications of mapping forest fires and urban extent from earth observing satellite data. The experimental results indicate that RAPT can be used to identify forest fires and urban areas with high precision and recall by using imperfect labels, even though obtaining expert annotated samples on a global scale is infeasible in these applications.

  • 出版日期2017-11-1