摘要

Objectives: We propose a new graphical framework for extracting the relevant dietary, social and environmental risk factors that are associated with an increased risk of nasopharyngeal carcinoma (NPC) on a case-control epidemiologic study that consists of 1289 subjects and 150 risk factors. %26lt;br%26gt;Methods: This framework builds on the use of Bayesian networks (BNs) for representing statistical dependencies between the random variables. We discuss a novel constraint-based procedure, called Hybrid Parents and Children (HPC), that builds recursively a local graph that includes all the relevant features statistically associated to the NPC, without having to find the whole BN first. The local graph is afterwards directed by the domain expert according to his knowledge. It provides a statistical profile of the recruited population, and meanwhile helps identify the risk factors associated to NPC. %26lt;br%26gt;Results: Extensive experiments on synthetic data sampled from known BNs show that the HPC outperforms state-of-the-art algorithms that appeared in the recent literature. From a biological perspective, the present study confirms that chemical products, pesticides and domestic fume intake from incomplete combustion of coal and wood are significantly associated with NPC risk. These results suggest that industrial workers are often exposed to noxious chemicals and poisonous substances that are used in the course of manufacturing. This study also supports previous findings that the consumption of a number of preserved food items, like house made proteins and sheep fat, are a major risk factor for NPC. %26lt;br%26gt;Conclusion: BNs are valuable data mining tools for the analysis of epidemiologic data. They can explicitly combine both expert knowledge from the field and information inferred from the data. These techniques therefore merit consideration as valuable alternatives to traditional multivariate regression techniques in epidemiologic studies.

  • 出版日期2012-1