摘要

A new histogram-based mutual information estimator using data-driven tree-structured partitions (TSP) is presented in this paper. The derived TSP is a solution to a complexity regularized empirical information maximization, with the objective of finding a good tradeoff between the known estimation and approximation errors. A distribution-free concentration inequality for this tree-structured learning problem as well as finite sample performance bounds for the proposed histogram-based solution is derived. It is shown that this solution is density-free strongly consistent and that it provides, with an arbitrary high probability, an optimal balance between the mentioned estimation and approximation errors. Finally, for the emblematic scenario of independence, I(X;Y), it is shown that the TSP estimate converges to zero with O(e(-n1/3+log log n)).

  • 出版日期2012-3

全文