Dimensionality reduction for documents with nearest neighbor queries

Ingram Stephen<sup>*</sup>; Munzner Tamara

doi:10.1016/j.neucom.2014.07.073

摘要

Document collections are often stored as sets of sparse, high-dimensional feature vectors. Performing dimensionality reduction (DR) on such high-dimensional datasets for the purposes of visualization presents algorithmic and qualitative challenges for existing DR techniques. We propose the Q-SNE algorithm for dimensionality reduction of document data, combining the scalable probability-based layout approach of BH-SNE with an improved component to calculate approximate nearest neighbors, using the query-based APQ approach that exploits an impact-ordered inverted file. We provide thorough experimental evidence that Q-SNE yields substantial quality improvements for layouts of large document collections with commensurate speed. Our experiments were conducted with six real-world benchmark datasets that range up to millions of documents and terms, and compare against three alternatives for nearest neighbor search and five alternatives for dimensionality reduction.

出版日期2015-2-20

全文

访问全文

收藏分享被引(22) 浏览

更新时间：2021-04-14 06:45

Dimensionality reduction for documents with nearest neighbor queries

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友