摘要

This paper presents a numerical study of the bottom-up and top-down inference processes in hierarchical models using the And-Or graph as an example. Three inference processes are identified for each node A in a recursively defined And-Or graph in which stochastic context sensitive image grammar is embedded: the alpha(A) process detects node A directly based on image features, the beta(A) process computes node A by binding its child node(s) bottom-up and the gamma(A) process predicts node A top-down from its parent node(s). All the three processes contribute to computing node A from images in complementary ways. The objective of our numerical study is to explore how much information each process contributes and how these processes should be integrated to improve performance. We study them in the task of object parsing using And-Or graph formulated under the Bayesian framework. Firstly, we isolate and train the alpha(A), beta(A) and gamma(A) processes separately by blocking the other two processes. Then, information contributions of each process are evaluated individually based on their discriminative power, compared with their respective human performance. Secondly, we integrate the three processes explicitly for robust inference to improve performance and propose a greedy pursuit algorithm for object parsing. In experiments, we choose two hierarchical case studies: one is junctions and rectangles in low-to-middle-level vision and the other is human faces in high-level vision. We observe that (i) the effectiveness of the alpha(A), beta(A) and gamma(A) processes depends on the scale and occlusion conditions, (ii) the alpha(face) process is stronger than the alpha processes of facial components, while beta(junctions) and beta(rectangle) work much better than their alpha processes, and (iii) the integration of the three processes improves performance in ROC comparisons.

  • 出版日期2011-6