A probabilistic data analytics methodology based on Bayesian Belief network for predicting and understanding breast cancer survival

被引:12
作者
Dag, Asli Z. [1 ]
Akcam, Zumrut [2 ]
Kibis, Eyyub [3 ]
Simsek, Serhat [3 ]
Delen, Dursun [4 ,5 ]
机构
[1] Creighton Univ, Heider Coll Business, Omaha, NE 68178 USA
[2] Stevens Inst Technol, Dept Comp Sci, Hoboken, NJ 07030 USA
[3] Montclair State Univ, Feliciano Sch Business, Montclair, NJ USA
[4] Oklahoma State Univ, Spears Sch Business, Stillwater, OK 74078 USA
[5] Ibn Haldun Univ, Sch Business, Istanbul, Turkey
关键词
Breast cancer; Data mining; Genetic Algorithm; Machine learning; Sensitivity Analysis; TREATMENT DECISIONS; PROGNOSTIC-FACTORS; GENETIC ALGORITHM; PATIENT; MODELS; STAGE; SELECTION; ENSEMBLE; DISEASE; SURGERY;
D O I
10.1016/j.knosys.2022.108407
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding breast cancer survival has proven to be a challenging problem for practitioners and researchers. Identifying the factors affecting cancer progression, their interrelationships, and their influence on patients' long-term survival helps make timely treatment decisions. The current study addresses this problem by proposing a Tree-Augmented Bayesian Belief Network (TAN)-based data analytics methodology comprising of four steps: data acquisition and preprocessing, variable selection via Genetic Algorithm (GA), data balancing with synthetic minority over-sampling and random under-sampling methods, and finally the development of the TAN model to determine the probabilistic inter-conditional dependency structure among breast cancer-related variables along with the posterior survival probabilities The proposed model is compared to well-known machine learning models. A what-if analysis has also been conducted to verify the associations among the variables in the TAN model. The relative importance of each variable has been investigated via sensitivity analysis. Finally, a decision support tool is developed to further explore the conditional dependency structure among the cancer-related factors. The results produced by the proposed methodology, namely the patient-specific posterior survival probabilities and the conditional relationships among the variables, can be used by healthcare professionals and physicians to improve the decision-making process in planning and managing breast cancer treatments. Our generic methodology can also accommodate other types of cancer and be applied to manage various medical procedures. (C) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 58 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   An efficient genetic algorithm for the p-median problem [J].
Alp, O ;
Erkut, E ;
Drezner, Z .
ANNALS OF OPERATIONS RESEARCH, 2003, 122 (1-4) :21-42
[3]  
[Anonymous], 2017, PROC 2017 IND SYST E
[4]   Bayesian and Quasi-Bayesian Estimators for Mutual Information from Discrete Data [J].
Archer, Evan ;
Park, Il Memming ;
Pillow, Jonathan W. .
ENTROPY, 2013, 15 (05) :1738-1755
[5]   Multi-modal advanced deep learning architectures for breast cancer survival prediction [J].
Arya, Nikhilanand ;
Saha, Sriparna .
KNOWLEDGE-BASED SYSTEMS, 2021, 221
[6]   Gene-expression profiles predict survival of patients with lung adenocarcinoma [J].
Beer, DG ;
Kardia, SLR ;
Huang, CC ;
Giordano, TJ ;
Levin, AM ;
Misek, DE ;
Lin, L ;
Chen, GA ;
Gharib, TG ;
Thomas, DG ;
Lizyness, ML ;
Kuick, R ;
Hayasaka, S ;
Taylor, JMG ;
Iannettoni, MD ;
Orringer, MB ;
Hanash, S .
NATURE MEDICINE, 2002, 8 (08) :816-824
[7]   Prognostic and predictive factors in breast cancer [J].
Bundred, NJ .
CANCER TREATMENT REVIEWS, 2001, 27 (03) :137-142
[8]  
Chawla NV, 2005, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, P853, DOI 10.1007/0-387-25465-X_40
[9]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119
[10]  
Che Y., 2017, PROC 2016 IEEE 28 IN, DOI [10.1109/ICTAI.2016.0138, DOI 10.1109/ICTAI.2016.0138]