摘要

Using data mining technology for disease prediction and diagnosis has become the focus of attention. Data mining technology provides an important means for extracting valuable medical rules hidden in medical data and acts as an important role in disease prediction and clinical diagnosis. This paper surveys some kind of popular data mining techniques for disease prediction and diagnosis, such as decision tree, associated rule analysis and clustering analysis. Then, a novel hybrid method of random forest and multivariate adaptive regression splines is proposed for building disease prediction model. Firstly, random forest algorithm is used to perform a preliminary screening of variables and to gain an importance ranks. Then, the new dataset selected by top-k important predictors is input into the MARS procedure, which is responsible for building interpretable models for predicting disease survivability. The capability of this combination method is evaluated using basic performance measurements (e.g., accuracy, sensitivity, and specificity) along with a 10-fold cross validation. Experimental results show that the proposed method provides a higher accuracy and a relatively simple model.