摘要

This paper presents a supervised learning approach to support the screening of HIV literature. The manual screening of biomedical literature is an important task in the process of systematic reviews. Researchers and curators have the very demanding, time-consuming, and error-prone task of manually identifying documents that should be included in a systematic review concerning a specific problem. We developed a supervised learning approach to support screening tasks, by automatically flagging potentially relevant documents from a list retrieved by a literature database search. To overcome the main issues associated with the automatic literature screening task, we evaluated the use of data sampling, feature combinations, and feature selection methods, generating a total of 105 classification models. The models yielding the best results were composed of a Logistic Model Trees classifier, a fairly balanced training set, and feature combination of Bag-Of-Words and MeSH terms. According to our results, the system correctly labels the great majority of relevant documents, making it usable to support HIV systematic reviews to allow researchers to assess a greater number of documents in less time.

  • 出版日期2016-6