Number of Relevant Directions in Principal Component Analysis and Wishart Random Matrices

作者:Majumdar Satya N*; Vivo Pierpaolo
来源:Physical Review Letters, 2012, 108(20): 200601.
DOI:10.1103/PhysRevLett.108.200601

摘要

We compute analytically, for large N, the probability P(N+, N) that a N X N Wishart random matrix has N+ eigenvalues exceeding a threshold N zeta, including its large deviation tails. This probability plays a benchmark role when performing the principal component analysis of a large empirical data set. We find that P(N+, N) approximate to exp[- beta N-2 psi(zeta)(N+/N), where beta is the Dyson index of the ensemble and psi(zeta)(kappa) is a rate function that we compute explicitly in the full range 0 <= kappa <= 1 and for any zeta. The rate function psi(zeta)(kappa) displays a quadratic behavior modulated by a logarithmic singularity close to its minimum kappa*(zeta). This is shown to be a consequence of a phase transition in an associated Coulomb gas problem. The variance Delta(N) of the number of relevant components is also shown to grow universally (independent of zeta) as Delta(N) similar to (beta pi(2) )(-1) 1nN for large N.

  • 出版日期2012-5-18