A three-way approach for learning rules in automatic knowledge-based topic models

作者:Khan Muhammad Taimoor*; Azam Nouman; Khalid Shehzad; Yao JingTao
来源:International Journal of Approximate Reasoning, 2017, 82: 210-226.
DOI:10.1016/j.ijar.2016.12.011

摘要

Topic modeling aims to uncover hidden thematic structures in a collection of documents by representing them as a set of topics. Automatic knowledge-based topic models are recently introduced to meet the demands of processing large-scale text collections. They are based on automatic extraction of rules from multiple domain corpuses. Generally, the extracted rules are large in number and some thresholds are used to select only a small number of useful rules. There are two shortcomings in this for selecting important rules. Firstly, they are based on fixed thresholds for extracting rules from all domain corpuses. Secondly, the thresholds are predefined or explicitly set by expert opinions and are not based on automated mechanisms. In this article, we address these shortcomings by considering a three-way approach based on rules having strong positive associations, rules having strong negative associations and rules having weak associations. A pair of thresholds defines and controls the three-way partitioning of the rules. It is argued that the domain specific and automated selection of thresholds in the three-way framework may be approached from the viewpoint of a tradeoff between the quantity of rules and the quality of rules. We apply the game-theoretic rough set (GTRS) model to implement this tradeoff. Algorithms using the GTRS are introduced for automatically determining the thresholds. Experimental results on Chen2014 dataset suggest an average improvement of 52.82 points in topic coherence by increasing the quantity of rules to 17.93%.

  • 出版日期2017-3