摘要

The majority of current credit-scoring models are built solely on accepted samples and thus cause sample bias. Sample bias is particularly severe in the peer-to-peer (P2P) lending domain due to its comparatively high rejection rate. Reject inference solves sample bias by inferring the possible outcomes of rejected samples and incorporating them into credit score modeling. This study addresses the problem of reject inference in a specific P2P lending domain from the perspective of semi-supervised learning. A novel reject inference method (CPLELightGBM) is proposed by combining the contrastive pessimistic likelihood estimation framework and an advanced gradient boosting decision tree classifier (LightGBM). The performance of the proposed CPLE-LightGBM method is validated on multiple datasets, and results demonstrate the efficiency of our proposal. Analysis of the influence of rejection rate on predictive accuracy reveals the usefulness of sampling in rejected datasets.