AN ALGORITHM FOR THE PRINCIPAL COMPONENT ANALYSIS OF LARGE DATA SETS

作者:Halko Nathan*; Martinsson Per Gunnar; Shkolnisky Yoel; Tygert Mark
来源:SIAM Journal on Scientific Computing, 2011, 33(5): 2580-2594.
DOI:10.1137/100804139

摘要

Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy-even on parallel processors-unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently out-of-core.) We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer's RAM.

  • 出版日期2011