A Provably Efficient Algorithm for Separable Topic Discovery

Ding Weicong<sup>*</sup>; Ishwar Prakash<sup>*</sup>; Saligrama Venkatesh<sup>*</sup>

doi:10.1109/JSTSP.2016.2555240

摘要

We develop necessary and sufficient conditions and a novel provably consistent and efficient algorithm for discovering topics (latent factors) from observations (documents) that are realized from a probabilistic mixture of shared latent factors that have certain properties. Our focus is on the class of topic models in which each shared latent factor contains a novel word that is unique to that factor, a property that has come to be known as separability. Our algorithm is based on the key insight that the novel words correspond to the extreme points of the convex hull formed by the row-vectors of a suitably normalized word co-occurrence matrix. We leverage this geometric insight to establish polynomial computational and sample complexity bounds based on a few isotropic random projections of the rows of the normalized word co-occurrence matrix. Our proposed random-projections-based algorithm is naturally amenable to an efficient distributed implementation and is attractive for modern web-scale distributed data mining applications.

出版日期2016-6

全文

访问全文

收藏分享被引(1) 浏览

更新时间：2021-03-22 14:17

A Provably Efficient Algorithm for Separable Topic Discovery

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友