摘要

Healthcare data analysis is currently a challenging and crucial research issue for the development of a robust disease diagnosis and prediction system. Many specific and a few common methods have been discussed in the literature for healthcare data classification. The present study implements 32 classification methods of six categories (Bayes, function-based, lazy, meta, rule-based, and tree-based) with the objective of searching the best and common categories and methods in healthcare data mining. The performance of each classification method has been evaluated based on analysis time, classification accuracy, precision, recall, F-measure, area under the receiver operating characteristic curve, root mean square error, kappa coefficient, Kulczynski's measure, and Fowlkes-Mallows index and compared with more than 90 classification methods used in past studies. Seventeen healthcare datasets related to thyroid, cancer, skin disease, heart disease, hepatitis, lymphography, audiology, diabetes, surgery, arrhythmia, postsurvival, liver, and tumour have been used in the performance assessment of the classification methods. The tree-based classification methods have a better performance (with an average classification accuracy of 79.92% and maximum accuracy of 99.50%; an analysis time of 3.91 s for the logistic model tree classifier) than the other methods. Furthermore, the association of datasets and classification methods has been discussed.