Exploring Gene Expression and Clinical Data for Identifying Prostate Cancer Severity Levels using Machine Learning Methods

被引:0
作者
Al Marouf, Ahmed [1 ]
Alhajj, Reda [1 ,2 ,3 ]
Rokne, Jon G. [1 ]
Ghose, Sunita [4 ]
Bismar, Tarek A. [5 ,6 ,7 ,8 ,9 ,10 ]
机构
[1] Univ Calgary, Dept Comp Sci, Calgary, AB, Canada
[2] Istanbul Medipol Univ, Dept Comp Engn, Istanbul, Turkiye
[3] Univ Southern Denmark, Dept Hlth Informat, Odense, Denmark
[4] Univ Alberta, Dept Med Oncol, Edmonton, AB, Canada
[5] Univ Calgary, Dept Oncol, Calgary, AB, Canada
[6] Univ Calgary, Dept Biochem, Calgary, AB, Canada
[7] Univ Calgary, Dept Mol Biol, Calgary, AB, Canada
[8] Univ Calgary, Cumming Sch Med, Dept Pathol & Lab, Calgary, AB, Canada
[9] Alberta Precis Lab, Calgary, AB, Canada
[10] Prostate Canc Ctr, Calgary, AB, Canada
来源
2023 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE | 2023年
关键词
Prostate cancer; Severity levels; Gleason Grading Group; Machine Learning; Random Forest; DIAGNOSIS;
D O I
10.1109/CCECE58730.2023.10288946
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Prostate cancer (PCa) is the most common type of cancer in men worldwide. It is a cancer that starts in the small walnut-shaped male gland called the prostate. From the prostate, it can form a metastasis into other organs. If detected and diagnosed early the survival rate may increase to 95%. Therefore, early detection and diagnosis are important tasks performed by a pathologist. The pathologist identifies the severity levels using a scale called the Gleason grading group (GGG). The GGG is found by pathologists by looking at a biopsy sample and assigning a grade of low, intermediate, or high to the sample. The pathologist then assesses a second sample in the same manner. The GGG is found by adding these two scores provides the total Gleason score. In this paper, we have explored tissue microarray (TMA) and clinical data collected by pathologists of Alberta Precision Laboratory, for predicting the severity level of prostate cancer using various machine learning methods. Traditional classifiers, such as Naive Bayes, Decision Tree, Support Vector Machine with Radial basis function (RBF), Logistic Regression, and ensemble classifiers, such as Random Forest, and Bagging with k-nearest neighbors have been applied through the machine learning pipeline containing imputation and sampling techniques. An integrated SMOTE-Tomek Links method is adopted for handling the class imbalance problem.The highest accuracy obtained is 99.64% from the Random Forest method.
引用
收藏
页数:6
相关论文
共 28 条
[1]  
a Batista G. E. a P., 2004, P 2003 WORKSH OP SOU, V3, P15
[2]  
[Anonymous], Python imblearn Tomek Link parameters
[3]  
[Anonymous], Python imblearn SMOTE parameters
[4]  
[Anonymous], PYTHON SCIKIT LEARN
[5]  
[Anonymous], 2022, Cancer Survival Rates in Canada
[6]   Predicting High-Risk Prostate Cancer Using Machine Learning Methods [J].
Barlow, Henry ;
Mao, Shunqi ;
Khushi, Matloob .
DATA, 2019, 4 (03)
[7]   Bladder cancer incidence and risk factors in men with prostate cancer: Results from cancer of the prostate strategic urologic research endeavor [J].
Boorjian, Stephen ;
Cowan, Janet E. ;
Konety, Badrinath R. ;
DuChane, Janeen ;
Tewari, Ashutosh ;
Carroll, Peter R. ;
Kane, Christopher J. .
JOURNAL OF UROLOGY, 2007, 177 (03) :883-887
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]   Machine learning applications in prostate cancer magnetic resonance imaging [J].
Cuocolo, Renato ;
Cipullo, Maria Brunella ;
Stanzione, Arnaldo ;
Ugga, Lorenzo ;
Romeo, Valeria ;
Radice, Leonardo ;
Brunetti, Arturo ;
Imbriaco, Massimo .
EUROPEAN RADIOLOGY EXPERIMENTAL, 2019, 3 (01)
[10]   Gleason grading: past, present and future [J].
Delahunt, Brett ;
Miller, Rose J. ;
Srigley, John R. ;
Evans, Andrew J. ;
Samaratunga, Hemamali .
HISTOPATHOLOGY, 2012, 60 (01) :75-86