摘要

Objective. To identify variables correlated with a diagnosis of diabetic peripheral neuropathy (DPN) using random forest modeling applied to electronic health records. Design. Retrospective analysis. Setting. Humedica de-identified electronic health records database. Subjects. Subjects >= 18 years old with type 2 diabetes from January 1, 2008-September 30, 2013 having continuous data for 1 year pre- and post-index with DPN (n = 35,050) and without DPN (n = 288,328) were identified. Methods. Demographic, clinical, and health care resource utilization variables (e.g., inpatient and outpatient encounters, medications, and procedures) were input into a random forest model to identify the most important correlates of a DPN diagnosis. Random forest modeling is a computationally extensive, robust data mining technique that accommodates large sets of variables to identify associated factors using an ensemble of classifications trees. Accuracy of the model was evaluated using receiver operating characteristic curves (ROC). Results. The final random forest model consisted of the following variables (importance) associated with a DPN diagnosis: Charlson Comorbidity Index score (100%), age (37.1%), number of pre-index procedures and services (29.7%), number of pre-index outpatient prescriptions (24.2%), number of pre-index outpatient visits (18.3%), number of pre-index laboratory visits (16.9%), number of pre-index outpatient office visits (12.1%), number of inpatient prescriptions (5.9%), and number of pain-related medication prescriptions (4.4%). ROC analysis confirmed model performance, with an area under the curve of 0.824 and accuracy of 89.6% (95% confidence interval 89.4%, 89.8%). Conclusions. Random forest modeling can determine likelihood of a DPN diagnosis. Further validation of the random forest model may help facilitate earlier diagnosis and enhance management strategies.

  • 出版日期2017-1