A multi-level approach to highly efficient recognition of Chinese spam short messages

Wang, Weimin<sup>*</sup>; Zhou, Dan

doi:10.1007/s11704-016-5415-8

摘要

The problem of spam short message (SMS) recognition involves many aspects of natural language processing. A good solution to solving the problem can not only improve the quality of people experiencing the mobile life, but also has a positive role on promoting the analysis of short text occurring in current mobile applications, such as Webchat and microblog. As spam SMSes have characteristics of sparsity, transformation and real-timedness, we propose three methods at different levels, i.e., recognition based on symbolic features, recognition based on text similarity, and recognition based on pattern matching. By combining these methods, we obtain a multi-level approach to spam SMS recognition. In order to enrich the pattern base to reduce manual labor and time, we propose a quasi-pattern learning method, which utilizes quasi-pattern matching results in the pattern matching process. The method can learn many interesting and new patterns from the SMS corpus. Finally, a comprehensive analysis indicates that our spam SMS recognition approach achieves a precision rate as high as 95.18%, and a recall rate of 95.51%.

出版日期2018-2
单位中国科学院大学; 江苏科技大学

全文

访问全文

收藏分享被引(5) 浏览

更新时间：2022-08-23 17:11

A multi-level approach to highly efficient recognition of Chinese spam short messages

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友