A named entity recognition approach for tweet streams using active learning

Van Cuong Tran; Dinh Tuyen Hoang; Ngoc Thanh Nguyen; Hwang Dosam<sup>*</sup>

doi:10.3233/JIFS-169126

摘要

In recent years, information extraction from tweets has been challenging for researchers in the fields of knowledge discovery and data mining. Unlike formal text, such as news articles and pieces of longer content, tweets are of a specific nature: short, noisy, and with dynamic content. Thus, it is difficult to apply the traditional natural language processing algorithms to analyze them. Active learning is well-suited to many problems in natural language processing, especially when unlabeled data may be abundant, but labeled data is limited. The method proposed here aims to minimize annotation costs while maximizing the desired performance from the model. The method recognizes named entities from tweet streams on Twitter by using an active learning method with different query strategies. The tweets are queried for labeling by a human annotator based on query-by-committee, uncertainty-based sampling, and diversity-based sampling. The experimental evaluations of the proposed method on tweet data achieved better results than random sampling.

出版日期2017

全文

访问全文

收藏分享被引(4) 浏览

更新时间：2022-08-05 02:55

A named entity recognition approach for tweet streams using active learning

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友