摘要

We compute exact distributions of statistics of hidden state sequences in general settings. Distributions are computed for undirected and directed graphical models that are represented using conditional random fields and factor graphs. The methods discussed are relevant for graphs with a sparseness of edges that allows exact computation of the normalization constant. The distributions are obtained in an efficient manner by integrating sequential updates of the statistic%26apos;s value with the sum-product algorithm. Applications of this work include discrete hidden state sequences perturbed by noise and/or missing values, and state sequences that serve to classify observations. In the case of classification, the methods give a way to quantify the uncertainty in statistics associated with the classifications. The algorithm is applied to model-based false discovery distributions for protein-protein interactions and distributions related to CpG island lengths in DNA sequences.

  • 出版日期2013-12