A novel framework for semantic entity identification and relationship integration in large scale text data

Wang, Dingxian; Liu, Xiao<sup>*</sup>; Luo, Hangzai; Fan, Jianping

doi:10.1016/j.future.2015.08.003

摘要

Semantic entities carry the most important semantics of text data. Therefore, the identification and the relationship integration of semantic entities are very important for applications requiring semantics of text data. However, current strategies are still facing many problems such as semantic entity identification, new word identification and relationship integration among semantic entities. To address these problems, a two-phase framework for semantic entity identification with relationship integration in large scale text data is proposed in this paper. In the first semantic entities identification phase, we propose a novel strategy to extract unknown text semantic entities by integrating statistical features, Decision Tree (DT), and Support Vector Machine (SVM) algorithms. Compared with traditional approaches, our strategy is more effective in detecting semantic entities and more sensitive to new entities that just appear in the fresh data. After extracting the semantic entities, the second phase of our framework is for the integration of Semantic Entities Relationships (SER) which can help to cluster the semantic entities. A novel classification method using features such as similarity measures and co occurrence probabilities is applied to tackle the clustering problem and discover the relationships among semantic entities. Comprehensive experimental results have shown that our framework can beat state-of-the-art strategies in semantic entity identification and discover over 80% relationship pairs among related semantic entities in large scale text data.

出版日期2016-11
单位西北大学; 华东师范大学

全文

访问全文

收藏分享被引(4) 浏览

更新时间：2023-06-02 08:26

A novel framework for semantic entity identification and relationship integration in large scale text data

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友