An Automated Histopathological Colorectal Cancer Multi-Class Classification System Based on Optimal Image Processing and Prominent Features

被引:0
作者
Tonni, Tasnim Jahan [1 ]
Rana, Shakil [1 ]
Fatema, Kaniz [1 ]
Karim, Asif [2 ]
Rony, Md. Awlad Hossen [1 ]
Hasan, Md. Zahid [1 ]
Mukta, Md. Saddam Hossain [3 ]
Azam, Sami [2 ]
机构
[1] Daffodil Int Univ, Dept Comp Sci & Engn, Hlth Informat Res Lab HIRL, Dhaka, Bangladesh
[2] Charles Darwin Univ, Fac Sci & Technol, Darwin, Northwest Terr, Australia
[3] LUT Univ, LUT Sch Engn Sci, Lappeenranta, Finland
关键词
colorectal cancer; ensemble model; feature selection; handcrafted features; image preprocessing; machine learning; multi-class classification;
D O I
10.1111/coin.70007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Colorectal cancer (CRC) is characterized by the uncontrollable growth of cancerous cells within the rectal mucosa. In contrast, colon polyps, precancerous growths, can develop into colon cancer, causing symptoms like rectal bleeding, abdominal pain, diarrhea, weight loss, and constipation. It is the leading cause of death worldwide, and this potentially fatal cancer severely afflicts the elderly. Furthermore, early diagnosis is crucial for effective treatment, as it is often more time-consuming and laborious for experts. This study improved the accuracy of CRC multi-class classification compared to previous research utilizing diverse datasets, such as NCT-CRC-HE-100 K (100,000 images) and CRC-VAL-HE-7 K (7,180 images). Initially, we utilized various image processing techniques on the NCT-CRC-HE-100 K dataset to improve image quality and noise-freeness, followed by multiple feature extraction and selection methods to identify prominent features from a large data hub and experimenting with different approaches to select the best classifiers for these critical features. The third ensemble model (XGB-LightGBM-RF) achieved an optimum accuracy of 99.63% with 40 prominent features using univariate feature selection methods. Moreover, the third ensemble model also achieved 99.73% accuracy from the CRC-VAL-HE-7 K dataset. After combining two datasets, the third ensemble model achieved 99.27% accuracy. In addition, we trained and tested our model with two different datasets. We used 80% data from NCT-CRC-HE-100 K and 20% data from CRC-VAL-HE-7 K, respectively, for training and testing purposes, while the third ensemble model obtained 98.43% accuracy in multi-class classification. The results show that this new framework, which was created using the third ensemble model, can help experts figure out what kinds of CRC diseases people are dealing with at the very beginning of an investigation.
引用
收藏
页数:20
相关论文
共 57 条
[1]   A Random Forest approach using imprecise probabilities [J].
Abellan, Joaquin ;
Mantas, Carlos J. ;
Castellano, Javier G. .
KNOWLEDGE-BASED SYSTEMS, 2017, 134 :72-84
[2]  
Alice K., 2021, PROC 1 INT C COMPUT
[3]   Estimating Financial Fraud through Transaction-Level Features and Machine Learning [J].
Alwadain, Ayed ;
Ali, Rao Faizan ;
Muneer, Amgad .
MATHEMATICS, 2023, 11 (05)
[4]  
[Anonymous], 2018, City of Hope
[5]   Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models [J].
Bailly, Alexandre ;
Blanc, Corentin ;
Francis, Elie ;
Guillotin, Thierry ;
Jamal, Fadi ;
Wakim, Bechara ;
Roy, Pascal .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 213
[6]   3D Texture Feature Extraction and Classification Using GLCM and LBP-Based Descriptors [J].
Barburiceanu, Stefania ;
Terebes, Romulus ;
Meza, Serban .
APPLIED SCIENCES-BASEL, 2021, 11 (05) :1-26
[7]   Deep learning for colon cancer histopathological images analysis [J].
Ben Hamida, A. ;
Devanne, M. ;
Weber, J. ;
Truntzer, C. ;
Derangere, V ;
Ghiringhelli, F. ;
Forestier, G. ;
Wemmert, C. .
COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 136
[8]   Stability of filter feature selection methods in data pipelines: a simulation study [J].
Bertolini, Roberto ;
Finch, Stephen J. .
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 17 (02) :225-248
[9]   Deep learning-based analysis of COVID-19 X-ray images: Incorporating clinical significance and assessing misinterpretation [J].
Bhuiyan, Md. Rahad Islam ;
Azam, Sami ;
Montaha, Sidratul ;
Jim, Risul Islam ;
Karim, Asif ;
Khan, Inam Ullah ;
Brady, Mark ;
Hasan, Md. Zahid ;
De Boer, Friso ;
Mukta, Md. Saddam Hossain .
DIGITAL HEALTH, 2023, 9
[10]   Magnetic Resonance Texture Analysis in Alzheimer's disease [J].
Cai, Jia-Hui ;
He, Yuan ;
Zhong, Xiao-Lin ;
Lei, Hao ;
Wang, Fang ;
Luo, Guang-Hua ;
Zhao, Heng ;
Liu, Jin-Cai .
ACADEMIC RADIOLOGY, 2020, 27 (12) :1774-1783