An Automated Histopathological Colorectal Cancer Multi-Class Classification System Based on Optimal Image Processing and Prominent Features

被引:0
作者
Tonni, Tasnim Jahan [1 ]
Rana, Shakil [1 ]
Fatema, Kaniz [1 ]
Karim, Asif [2 ]
Rony, Md. Awlad Hossen [1 ]
Hasan, Md. Zahid [1 ]
Mukta, Md. Saddam Hossain [3 ]
Azam, Sami [2 ]
机构
[1] Daffodil Int Univ, Dept Comp Sci & Engn, Hlth Informat Res Lab HIRL, Dhaka, Bangladesh
[2] Charles Darwin Univ, Fac Sci & Technol, Darwin, Northwest Terr, Australia
[3] LUT Univ, LUT Sch Engn Sci, Lappeenranta, Finland
关键词
colorectal cancer; ensemble model; feature selection; handcrafted features; image preprocessing; machine learning; multi-class classification;
D O I
10.1111/coin.70007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Colorectal cancer (CRC) is characterized by the uncontrollable growth of cancerous cells within the rectal mucosa. In contrast, colon polyps, precancerous growths, can develop into colon cancer, causing symptoms like rectal bleeding, abdominal pain, diarrhea, weight loss, and constipation. It is the leading cause of death worldwide, and this potentially fatal cancer severely afflicts the elderly. Furthermore, early diagnosis is crucial for effective treatment, as it is often more time-consuming and laborious for experts. This study improved the accuracy of CRC multi-class classification compared to previous research utilizing diverse datasets, such as NCT-CRC-HE-100 K (100,000 images) and CRC-VAL-HE-7 K (7,180 images). Initially, we utilized various image processing techniques on the NCT-CRC-HE-100 K dataset to improve image quality and noise-freeness, followed by multiple feature extraction and selection methods to identify prominent features from a large data hub and experimenting with different approaches to select the best classifiers for these critical features. The third ensemble model (XGB-LightGBM-RF) achieved an optimum accuracy of 99.63% with 40 prominent features using univariate feature selection methods. Moreover, the third ensemble model also achieved 99.73% accuracy from the CRC-VAL-HE-7 K dataset. After combining two datasets, the third ensemble model achieved 99.27% accuracy. In addition, we trained and tested our model with two different datasets. We used 80% data from NCT-CRC-HE-100 K and 20% data from CRC-VAL-HE-7 K, respectively, for training and testing purposes, while the third ensemble model obtained 98.43% accuracy in multi-class classification. The results show that this new framework, which was created using the third ensemble model, can help experts figure out what kinds of CRC diseases people are dealing with at the very beginning of an investigation.
引用
收藏
页数:20
相关论文
共 57 条
[41]   Robust system identification and model predictions in the presence of systematic uncertainty [J].
Pasquier, Romain ;
Smith, Ian F. C. .
ADVANCED ENGINEERING INFORMATICS, 2015, 29 (04) :1096-1109
[42]   An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms [J].
Rafid, A. K. M. Rakibul Haque ;
Azam, Sami ;
Montaha, Sidratul ;
Karim, Asif ;
Fahim, Kayes Uddin ;
Hasan, Md. Zahid .
BIOLOGY-BASEL, 2022, 11 (11)
[43]   DeepChestGNN: A Comprehensive Framework for Enhanced Lung Disease Identification through Advanced Graphical Deep Features [J].
Rana, Shakil ;
Hosen, Md Jabed ;
Tonni, Tasnim Jahan ;
Rony, Md. Awlad Hossen ;
Fatema, Kaniz ;
Hasan, Md. Zahid ;
Rahman, Md. Tanvir ;
Khan, Risala Tasin ;
Jan, Tony ;
Whaiduzzaman, Md .
SENSORS, 2024, 24 (09)
[44]   RETRACTED: Detection of Breast Cancer Using Histopathological Image Classification Dataset with Deep Learning Techniques (Retracted Article) [J].
Reshma, V. K. ;
Arya, Nancy ;
Ahmad, Sayed Sayeed ;
Wattar, Ihab ;
Mekala, Sreenivas ;
Joshi, Shubham ;
Krah, Daniel .
BIOMED RESEARCH INTERNATIONAL, 2022, 2022
[45]   Unsupervised online detection and prediction of outliers in streams of sensor data [J].
Reunanen, Niko ;
Raty, Tomi ;
Jokinen, Juho J. ;
Hoyt, Tyler ;
Culler, David .
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2020, 9 (03) :285-314
[46]   Fake news detection: deep semantic representation with enhanced feature engineering [J].
Samadi, Mohammadreza ;
Momtazi, Saeedeh .
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2023,
[47]   A boosting ensemble learning based hybrid light gradient boosting machine and extreme gradient boosting model for predicting house prices [J].
Sibindi, Racheal ;
Mwangi, Ronald Waweru ;
Waititu, Anthony Gichuhi .
ENGINEERING REPORTS, 2023, 5 (04)
[48]  
Song Yan-Yan, 2015, Shanghai Arch Psychiatry, V27, P130, DOI 10.11919/j.issn.1002-0829.215044
[49]   Estimates of residential building energy consumption using a multi-verse optimizer-based support vector machine with k-fold cross-validation [J].
Tabrizchi, Hamed ;
Javidi, Mohammad Masoud ;
Amirzadeh, Vahid .
EVOLVING SYSTEMS, 2021, 12 (03) :755-767
[50]   Deep Learning Techniques for the Classification of Colorectal Cancer Tissue [J].
Tsai, Min-Jen ;
Tao, Yu-Han .
ELECTRONICS, 2021, 10 (14)