摘要

In the credit market, assessment of a borrower's default risk over time is essential to enabling timely risk management, since borrowers' exposure to risk and the losses that result from defaults are strongly related to the time when they default. Mixture cure models, with their ability to predict not only whether borrowers will default but also when they are likely to default, have been applied to credit scoring. We propose a prediction-driven mixture cure model, which sacrifices interpretability for potentially better prediction performance, and apply it to credit scoring. In the incidence part of the mixture cure model, we substitute the typical statistical incidence model (i.e., logistic regression) with a more flexible, and hopefully more accurate, classification method (i.e., random forests). For the latency part, we propose a survival analysis model, named Time-Dependent Hazards, which accommodates a direct relationship between failure times and covariates and can potentially better predict the probability of default over time than the standard Cox PH model. Empirical evaluation using real-world data from a major P2P lending institution in China shows that both extensions contributed to performance improvement in both discrimination and calibration.