A three-way approach for learning rules in automatic knowledge-based topic models

Khan Muhammad Taimoor<sup>*</sup>; Azam Nouman; Khalid Shehzad; Yao JingTao

doi:10.1016/j.ijar.2016.12.011

摘要

Topic modeling aims to uncover hidden thematic structures in a collection of documents by representing them as a set of topics. Automatic knowledge-based topic models are recently introduced to meet the demands of processing large-scale text collections. They are based on automatic extraction of rules from multiple domain corpuses. Generally, the extracted rules are large in number and some thresholds are used to select only a small number of useful rules. There are two shortcomings in this for selecting important rules. Firstly, they are based on fixed thresholds for extracting rules from all domain corpuses. Secondly, the thresholds are predefined or explicitly set by expert opinions and are not based on automated mechanisms. In this article, we address these shortcomings by considering a three-way approach based on rules having strong positive associations, rules having strong negative associations and rules having weak associations. A pair of thresholds defines and controls the three-way partitioning of the rules. It is argued that the domain specific and automated selection of thresholds in the three-way framework may be approached from the viewpoint of a tradeoff between the quantity of rules and the quality of rules. We apply the game-theoretic rough set (GTRS) model to implement this tradeoff. Algorithms using the GTRS are introduced for automatically determining the thresholds. Experimental results on Chen2014 dataset suggest an average improvement of 52.82 points in topic coherence by increasing the quantity of rules to 17.93%.

出版日期2017-3

全文

访问全文

收藏分享被引(12) 浏览

更新时间：2021-01-19 00:21

A three-way approach for learning rules in automatic knowledge-based topic models

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友