摘要

This paper proposes an estimator of mutual information for both discrete and continuous variables and applies it to the Chow-Liu algorithm to find a forest that expresses probabilistic relations among them. The state-of-the-art assumes that the continuous variables are Gaussian and that the graphical model under discrete and continuous variables is ANOVA. Consequently, it is difficult to obtain the maximum likelihood of three connected variables such that the center is Gaussian and the other two are discrete, and thus, the state-of-the-art restricts the class to the forest such that there is no Gaussian node between discrete variables. The proposed method executes in a general setting without any assumptions, preparing several meshes, computing the mutual information values, and selecting the maximum value. We prove that the number of meshes to be prepared is at most 0 (log(2) n) and that the estimated mutual information is no larger than zero if and only if the variables are independent for large n. Finally, we apply the proposed method to the problems of gene differential analysis and relation discovery between gene expression and SNP5 (single nucleotide polymorphisms). In particular, for the latter experiment, we demonstrate that the proposed method successfully captures the relation among them but that the state-of-the-art fails because of the merits and demerits of the proposed and existing methods.

  • 出版日期2017-1