Notifiable infectious disease surveillance with data collected by search engine

作者:Zhou Xi chuan; Shen Hai bin*
来源:Journal of Zhejiang University-Science C(Computers and Electronics), 2010, 11(4): 241-248.
DOI:10.1631/jzus.C0910371

摘要

Notifiable infectious diseases are a major public health concern in China, causing about five million illnesses and twelve thousand deaths every year. Early detection of disease activity, when followed by a rapid response, can reduce both social and medical impact of the disease. We aim to improve early detection by monitoring health-seeking behavior and disease-related news over the Internet. Specifically, we counted unique search queries submitted to the Baidu search engine in 2008 that contained disease-related search terms. Meanwhile we counted the news articles aggregated by Baidu's robot programs that contained disease-related keywords. We found that the search frequency data and the news count data both have distinct temporal association with disease activity. We adopted a linear model and used searches and news with 1-200-day lead time as explanatory variables to predict the number of infections and deaths attributable to four notifiable infectious diseases, i.e., scarlet lever, dysentery, AIDS. and tuberculosis. With the search frequency data and news count data, our approach can quantitatively estimate up-to-date epidemic trends 10-40 days ahead of the release of Chinese Centers for Disease Control and Prevention (Chinese CDC) reports. This approach may provide an additional tool for notifiable infectious disease surveillance.