Automated categorisation of clinical incident reports using statistical text classification

作者:Ong Mei Sing*; Magrabi Farah; Coiera Enrico
来源:Quality and Safety in Health Care, 2010, 19(6): e55.
DOI:10.1136/qshc.2009.036657

摘要

Objectives To explore the feasibility of using statistical text classification techniques to automatically categorise clinical incident reports.
Methods Statistical text classifiers based on Naive Bayes and Support Vector Machine algorithms were trained and tested on incident reports submitted by public hospitals to identify two classes of clinical incidents: inadequate clinical handover and incorrect patient identification. Each classifier was trained on 600 reports (300 positives, 300 negatives), and tested on 372 reports (248 positives, 124 negatives). The results were evaluated using standard measures of accuracy, precision, recall, F-measure and area under curve (AUC) of receiver operating characteristics (ROC). Classifier learning rates were also evaluated, using classifier accuracy against training set size.
Results All classifiers performed well in categorising clinical handover and patient identification incidents. Naive Bayes attained the best performance on handover incidents, correctly identifying 86.29% of reporter-classified incidents (precision=0.84, recall=0.90, F-measure=0.87, AUC=0.93) and 91.53% of expert-classified incidents (precision=0.87, recall=0.98, F-measure=0.92, AUC=0.97). For patient identification incidents, the best results were obtained when Support Vector Machine with radial-basis function kernel was used to classify reporter-classified reports (accuracy=97.98%, precision=0.98, recall=0.98, F-measure=0.98, AUC=1.00); and when Naive Bayes was used on expert-classified reports (accuracy=95.97%, precision=0.95, recall=0.98, F-measure=0.96, AUC=0.99). A relatively small training set was found to be adequate, with most classifiers achieving an accuracy above 80% when the training set size was as small as 100 samples.
Conclusions This study demonstrates the feasibility of using text classification techniques to automatically categorise clinical incident reports.

  • 出版日期2010-12