Analysis of nasopharyngeal carcinoma risk factors with Bayesian networks

被引:16
作者
Aussem, Alex [1 ,2 ]
de Morais, Sergio Rodrigues [1 ,2 ]
Corbex, Marilys [3 ]
机构
[1] Univ Lyon 1, Dept Comp Sci, Graph Theory Machine Learning & Multiagent Syst L, F-69622 Villeurbanne, France
[2] Univ Lyon, F-69000 Lyon, France
[3] Int Agcy Res Canc, F-69280 Lyon, France
关键词
Machine learning; Predictive modeling; Bayesian networks; Feature selection; Epidemiology; Nasopharyngeal carcinoma; MARKOV BLANKET INDUCTION; FEATURE-SELECTION; CAUSAL DISCOVERY; LOCAL CAUSAL;
D O I
10.1016/j.artmed.2011.09.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objectives: We propose a new graphical framework for extracting the relevant dietary, social and environmental risk factors that are associated with an increased risk of nasopharyngeal carcinoma (NPC) on a case-control epidemiologic study that consists of 1289 subjects and 150 risk factors. Methods: This framework builds on the use of Bayesian networks (BNs) for representing statistical dependencies between the random variables. We discuss a novel constraint-based procedure, called Hybrid Parents and Children (HPC), that builds recursively a local graph that includes all the relevant features statistically associated to the NPC, without having to find the whole BN first. The local graph is afterwards directed by the domain expert according to his knowledge. It provides a statistical profile of the recruited population, and meanwhile helps identify the risk factors associated to NPC. Results: Extensive experiments on synthetic data sampled from known BNs show that the HPC outperforms state-of-the-art algorithms that appeared in the recent literature. From a biological perspective, the present study confirms that chemical products, pesticides and domestic fume intake from incomplete combustion of coal and wood are significantly associated with NPC risk. These results suggest that industrial workers are often exposed to noxious chemicals and poisonous substances that are used in the course of manufacturing. This study also supports previous findings that the consumption of a number of preserved food items, like house made proteins and sheep fat, are a major risk factor for NPC. Conclusion: BNs are valuable data mining tools for the analysis of epidemiologic data. They can explicitly combine both expert knowledge from the field and information inferred from the data. These techniques therefore merit consideration as valuable alternatives to traditional multivariate regression techniques in epidemiologic studies. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:53 / 62
页数:10
相关论文
共 37 条
[1]  
Aliferis C, 2003, P INT C MATH ENG TEC, P23
[2]  
Aliferis CF, 2010, J MACH LEARN RES, V11, P171
[3]  
Aliferis CF, 2010, J MACH LEARN RES, V11, P235
[4]  
[Anonymous], 2008, Journal of Machine Learning Research (JMLR): Workshop and Conference Proceedings
[5]  
[Anonymous], 2004, Learning Bayesian Networks
[6]   Nasopharyngeal carcinoma in Malaysian Chinese: occupational exposures to particles, formaldehyde and heat [J].
Armstrong, RW ;
Imrey, PB ;
Lye, MS ;
Armstrong, MJ ;
Yu, MC ;
Sani, S .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2000, 29 (06) :991-998
[7]   Analysis of lifestyle and metabolic predictors of visceral obesity with Bayesian Networks [J].
Aussem, Alex ;
Tchernof, Andre ;
de Morais, Sergio Rodrigues ;
Rome, Sophie .
BMC BIOINFORMATICS, 2010, 11
[8]   A conservative feature subset selection algorithm with missing data [J].
Aussem, Alex ;
de Morais, Sergio Rodrigues .
NEUROCOMPUTING, 2010, 73 (4-6) :585-590
[9]   Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS [J].
Blanco, R ;
Inza, M ;
Merino, M ;
Quiroga, J ;
Larrañaga, P .
JOURNAL OF BIOMEDICAL INFORMATICS, 2005, 38 (05) :376-388
[10]  
Bromberg F, 2006, SIAM PROC S, P141