A named entity recognition approach for tweet streams using active learning

作者:Van Cuong Tran; Dinh Tuyen Hoang; Ngoc Thanh Nguyen; Hwang Dosam*
来源:Journal of Intelligent and Fuzzy Systems, 2017, 32(2): 1277-1287.
DOI:10.3233/JIFS-169126

摘要

In recent years, information extraction from tweets has been challenging for researchers in the fields of knowledge discovery and data mining. Unlike formal text, such as news articles and pieces of longer content, tweets are of a specific nature: short, noisy, and with dynamic content. Thus, it is difficult to apply the traditional natural language processing algorithms to analyze them. Active learning is well-suited to many problems in natural language processing, especially when unlabeled data may be abundant, but labeled data is limited. The method proposed here aims to minimize annotation costs while maximizing the desired performance from the model. The method recognizes named entities from tweet streams on Twitter by using an active learning method with different query strategies. The tweets are queried for labeling by a human annotator based on query-by-committee, uncertainty-based sampling, and diversity-based sampling. The experimental evaluations of the proposed method on tweet data achieved better results than random sampling.

  • 出版日期2017