摘要

Principal component analysis (PCA) is a fundamental primitive of many data analysis, array processing, and machine learning methods. In applications where extremely large arrays of data are involved, particularly in distributed data acquisition systems, distributed PCA algorithms can harness local communications and network connectivity to overcome the need of communicating and accessing the entire array locally. A key feature of distributed PCA algorithm is that they defy the conventional notion that the first step toward computing the principal vectors is to form a sample covariance. This paper is a survey of the methodologies to perform distributed PCA on different data sets, their performance, and of their applications in the context of distributed data acquisition systems.