LightGBM: A Leading Force in Breast Cancer Diagnosis Through Machine Learning and Image Processing

被引:4
作者
Kanber, Bassam M. [1 ]
Al Smadi, Ahmad [2 ]
Noaman, Naglaa F. [1 ]
Liu, Bo [1 ]
Gou, Shuiping [1 ]
Alsmadi, Mutasem K. [3 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Peoples R China
[2] Zarqa Univ, Dept Data Sci & Artificial Intelligence, Zarqa 13100, Jordan
[3] Imam Abdulrahman Bin Faisal Univ, Coll Appl Studies & Community Serv, Dept Management Informat Syst, Dammam 34212, Saudi Arabia
关键词
Breast cancer; Histopathology; Image processing; Biomedical imaging; Machine learning; Feature extraction; Image classification; Medical diagnostic imaging; Performance evaluation; histopathological images; image classification; machine learning; feature extraction;
D O I
10.1109/ACCESS.2024.3375755
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The early diagnosis of breast cancer (BC), a prominent global cause of mortality, necessitates the development of innovative diagnostic strategies. This study leverages machine learning (ML) and advanced image processing techniques to analyze histopathology images, thereby augmenting the capabilities for BC diagnosis. A robust feature extraction (FE) pipeline is developed, integrating techniques such as color histogram analysis, contour FE, hu moments, and haralick texture features. Ten ML algorithms, including LightGBM (LGBM), CatBoost, and XGBoost, are systematically evaluated across varying magnifications of the BreakHis dataset to assess their diagnostic performance. The research introduces a novel approach by combining distinct FE techniques, enhancing the model's ability to distinguish between benign and malignant tissues with exceptional accuracy. These integrated techniques significantly elevate BC diagnostic accuracy and reliability, holding the potential to positively impact patient outcomes and healthcare systems. Notably, the combination of the FE pipeline and LGBM achieves the highest accuracy, reported in two forms: before augmentation accuracies (0.9598 for 40x, 0.9516 for 100 x , 0.9652 for 200 x , 0.9535 for 400 x , and 0.9570 for all magnifications combined) and after augmentation accuracies (0.9949 for 40x , 0.9870 for 100 x , 0.9987 for 200 x , and 0.9918 for 400 x ) for the classification of magnification histopathological images. Moreover, the study highlights the crucial role of augmentation in further refining classification accuracy. Extending its applicability, the proposed method is also successfully applied to the classification of lung colon cancer images (LC25000 dataset), achieving an impressive accuracy of 0.9983. The model demonstrates its effectiveness and adaptability as a compelling method for histopathological image classification. This research contributes to the evolving field of BC diagnostics, offering a framework for robust and accurate ML-based diagnostic tools that may revolutionize cancer diagnosis and enhance patient care.
引用
收藏
页码:39811 / 39832
页数:22
相关论文
共 60 条
[11]  
Baba A. I., 2007, Comparative Oncology, P407
[12]  
Biau G, 2016, TEST-SPAIN, V25, P197, DOI 10.1007/s11749-016-0481-7
[13]  
Bradski G., 2008, Learning OpenCV: Computer Vision with the OpenCV Library
[14]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[15]   Comparison of Accuracy of Diagnostic Modalities for Evaluation of Breast Cancer With Review of Literature [J].
Bukhari, Mulazim Hussain ;
Akhtar, Zahid Mahmood .
DIAGNOSTIC CYTOPATHOLOGY, 2009, 37 (06) :416-424
[16]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[17]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[18]   Predicting invasive ductal carcinoma tissues in whole slide images of breast Cancer by using convolutional neural network model and multiple classifiers [J].
Deepa, B. G. ;
Senthil, S. .
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (06) :8575-8596
[19]   Transfer learning based histopathologic image classification for breast cancer detection [J].
Deniz, Erkan ;
Sengur, Abdulkadir ;
Kadiroglu, Zehra ;
Guo, Yanhui ;
Bajaj, Varun ;
Budak, Umit .
HEALTH INFORMATION SCIENCE AND SYSTEMS, 2018, 6
[20]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232