Impact of censoring on learning Bayesian networks in survival modelling

被引:32
作者
Stajduhar, Ivan [1 ]
Dalbelo-Basic, Bojana [2 ]
Bogunovic, Nikola [2 ]
机构
[1] Univ Rijeka, Fac Engn, Dept Automat Elect & Comp, Rijeka 51000, Croatia
[2] Univ Zagreb, Dept Elect Microelect Comp & Intelligent Syst, Fac Elect Engn & Comp, Zagreb 10000, Croatia
关键词
Bayesian networks; Structure learning; Survival analysis; Censoring; Prognostic models in medicine; Medical decision support; BREAST-CANCER PATIENTS; NEURAL-NETWORK; STATISTICAL-DATA; PROGNOSIS; MANAGEMENT; REGRESSION; SEPARATION; LYMPHOMA; MEDICINE; SURGERY;
D O I
10.1016/j.artmed.2009.08.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: Bayesian networks are commonly used for presenting uncertainty and covariate interactions in an easily interpretable way. Because of their efficient inference and ability to represent causal relationships, they are an excellent choice for medical decision support systems in diagnosis, treatment, and prognosis. Although good procedures for learning Bayesian networks from data have been defined, their performance in learning from censored survival data has not been widely studied. In this paper, we explore how to use these procedures to Learn about possible interactions between prognostic factors and their influence on the variate of interest. We study how censoring affects the probability of learning correct Bayesian network structures. Additionally, we analyse the potential usefulness of the learnt models for predicting the time-independent probability of an event of interest. Methods and materials: We analysed the influence of censoring with a simulation on synthetic data sampled from randomly generated Bayesian networks. We used two well-known methods for learning Bayesian networks from data: a constraint-based method and a score-based method. We compared the performance of each method under different levels of censoring to those of the naive Bayes classifier and the proportional hazards model. We did additional experiments on several datasets from real-world medical domains. The machine-learning methods treated censored cases in the data as event-free. Results: We report and compare results for several commonly used model evaluation metrics. On average, the proportional hazards method outperformed other methods in most censoring setups. As part of the simulation study, we also analysed structural similarities of the learnt networks. Heavy censoring, as opposed to no censoring, produces up to a 5% surplus and up to 10% missing total arcs. It also produces up to 50% missing arcs that should originally be connected to the variate of interest. Conclusion: Presented methods for learning Bayesian networks from data can be used to learn from censored survival data in the presence of light censoring (up to 20%) by treating censored cases as event-free. Given intermediate or heavy censoring, the learnt models become tuned to the majority class and would thus require a different approach. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:199 / 217
页数:19
相关论文
共 59 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Using probabilistic and decision-theoretic methods in treatment and prognosis modeling [J].
Andreassen, S ;
Riekehr, C ;
Kristensen, B ;
Schonheyder, HC ;
Leibovici, L .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 1999, 15 (02) :121-134
[3]  
[Anonymous], 1984, WADSWORTH INC
[4]  
[Anonymous], R LANG ENV STAT COMP
[5]  
[Anonymous], 2003, Statistical Methods for Survival Data Analysis
[6]  
[Anonymous], 2010, ARTIF INTELL
[7]  
[Anonymous], 1993, Proceedings of the 13th International Joint Conference on Artificial Intelligence
[8]  
[Anonymous], 2002, Graphical Models-Methods for Data Analysis and Mining [Relatorio]
[9]  
[Anonymous], 2000, CAUSATION PREDICTION
[10]  
Asuncion A., UCI MACHINE LEARNING