Automatic discovery of adverse reactions through Chinese social media

作者:Zhang, Mengxue; Zhang, Meizhuo; Ge, Chen; Liu, Quanyang; Wang, Jiemin; Wei, Jia*; Zhu, Kenny Q.*
来源:Data Mining and Knowledge Discovery, 2019, 33(4): 848-870.
DOI:10.1007/s10618-018-00610-2

摘要

Despite tremendous efforts made before the release of every drug, some adverse drug reactions (ADRs) may go undetected and thus, cause harm to both the users and to the pharmaceutical companies. One plausible venue to collect evidence of such ADRs is online social media, where patients and doctors discuss medical conditions and their treatments. There is substantial previous research on ADRs extraction from English online forums. However, very limited research was done on Chinese data. In this paper, we try to use the posts from two popular Chinese social media as the original dataset. We propose a semi-supervised learning framework that detects mentions of medications and colloquial ADR terms and extracts lexicon-syntactic features from natural language text to recognize positive associations between drug use and ADRs. The key contribution is an automatic label generation algorithm, which requires very little manual annotation. This bootstrapping algorithm could also be further applied on English data. The research results indicate that our algorithm outperforms the hidden Markov model and conditional random fields. With this approach, we discovered a large number of side effects for a variety of popular medicines in real world scenarios.