Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection

被引:2
作者
Khanna, Divya [1 ]
Kumar, Arun [2 ]
Bhat, Shahid Ahmad [3 ]
机构
[1] Chitkara Univ, Inst Engn & Technol, Rajpura 140401, Punjab, India
[2] Madhav Inst Sci & Technol, Ctr Artificial Intelligence, Gwalior 474005, Madhya Pradesh, India
[3] LUT Univ, LUT Business Sch, Lappeenranta 53851, Finland
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Lung cancer; Cancer; Predictive models; Volatile organic compounds; Machine learning; Lungs; Feature extraction; Analytical models; Support vector machines; Biomarkers; VOCs; lung cancer; biomarkers; machine learning models; ensemble model; ensemble feature selection approach; B-CELL EPITOPES; ALLERGENIC PROTEINS; CLASSIFICATION; BIOMARKERS; LOCATION; DISEASE; SCENT;
D O I
10.1109/ACCESS.2025.3527027
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The advancement of biomarkers is critically important at present, as lung cancer is a leading cause of death. In the present study, volatile organic compounds (VOCs) are considered as biomarkers to predict lung cancer. VOCs from seven different sources including breath, blood, urine, cell line, plerual fluid, cancer tissue and lung tissue are targeted to enhance the prediction reliability. Feature selection and models fusion have been focused on during this study. Five in-built and one proposed ensemble machine learning model have been utilised to investigate the different types of VOCs. The idea behind designing one ensemble model is to combine multiple individual models for better performance by using optimal feature sets. This reasoning led to the design of an ensemble model to predict breath VOCs. The AvNNet model has superior performance in predicting blood VOCs, cancer tissue VOCs, cell line VOCs, and urine VOCs compared to four other models, achieving accuracies of 70%, 80%, 70%, and 90% accordingly on the validation dataset. The Blackboost model achieved 90% accuracy on the validation dataset in its prediction of lung tissue VOCs. With 90% accuracy on a validation dataset, the random forest model predicts pleural fluid volatile organic compounds efficiently. When compared to individual models, the proposed ensemble model predicts breath VOCs more effectively and achieves 100% accuracy on the validation dataset.
引用
收藏
页码:9809 / 9820
页数:12
相关论文
共 57 条
[1]   VOCC: a database of volatile organic compounds in cancer [J].
Agarwal, Subhash Mohan ;
Sharma, Mansi ;
Fatima, Shehnaz .
RSC ADVANCES, 2016, 6 (115) :114783-114789
[2]   Predictive estimation of protein linear epitopes by using the program PEOPLE [J].
Alix, AJP .
VACCINE, 1999, 18 (3-4) :311-314
[3]   Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes [J].
Austin, Peter C. ;
Tu, Jack V. ;
Ho, Jennifer E. ;
Levy, Daniel ;
Lee, Douglas S. .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2013, 66 (04) :398-407
[4]   Non-invasive diagnostic tests for bladder cancer: A review of the literature [J].
Bassi, P ;
De Marco, V ;
De Lisa, A ;
Mancini, M ;
Pinto, F ;
Bertoloni, R ;
Longo, F .
UROLOGIA INTERNATIONALIS, 2005, 75 (03) :193-200
[5]  
Binson V., 2021, Journal of Physics: Conference Series, V1950, P012065, DOI [DOI 10.1088/1742-6596/1950/1/012065, 10.1088/1742-6596/1950/1/012065]
[6]   Prediction of lung cancer with a sensor array based e-nose system using machine learning methods [J].
Binson, V. A. ;
Subramoniam, M. ;
Mathew, Luke .
MICROSYSTEM TECHNOLOGIES-MICRO-AND NANOSYSTEMS-INFORMATION STORAGE AND PROCESSING SYSTEMS, 2024, 30 (11) :1421-1434
[7]   Computational methods for prediction of T-cell epitopes - a framework for modelling, testing, and applications [J].
Brusic, V ;
Bajic, VB ;
Petrovsky, N .
METHODS, 2004, 34 (04) :436-443
[8]   Human exhaled air analytics: Biomarkers of diseases [J].
Buszewski, Boguslaw ;
Kesy, Martyna ;
Ligor, Tomasz ;
Amann, Anton .
BIOMEDICAL CHROMATOGRAPHY, 2007, 21 (06) :553-566
[9]   Analysis of volatile organic compounds in exhaled breath for lung cancer diagnosis using a sensor system [J].
Chang, Ji-Eun ;
Lee, Dae-Sik ;
Ban, Sang-Woo ;
Oh, Jaeho ;
Jung, Moon Youn ;
Kim, Seung-Hwan ;
Park, SungJoon ;
Persaud, Krishna ;
Jheon, Sanghoon .
SENSORS AND ACTUATORS B-CHEMICAL, 2018, 255 :800-807
[10]   Prediction of linear B-cell epitopes using amino acid pair antigenicity scale [J].
Chen, J. ;
Liu, H. ;
Yang, J. ;
Chou, K.-C. .
AMINO ACIDS, 2007, 33 (03) :423-428