Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection

被引:1
作者
Khanna, Divya [1 ]
Kumar, Arun [2 ]
Bhat, Shahid Ahmad [3 ]
机构
[1] Chitkara Univ, Inst Engn & Technol, Rajpura 140401, Punjab, India
[2] Madhav Inst Sci & Technol, Ctr Artificial Intelligence, Gwalior 474005, Madhya Pradesh, India
[3] LUT Univ, LUT Business Sch, Lappeenranta 53851, Finland
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Lung cancer; Cancer; Predictive models; Volatile organic compounds; Machine learning; Lungs; Feature extraction; Analytical models; Support vector machines; Biomarkers; VOCs; lung cancer; biomarkers; machine learning models; ensemble model; ensemble feature selection approach; B-CELL EPITOPES; ALLERGENIC PROTEINS; CLASSIFICATION; BIOMARKERS; LOCATION; DISEASE; SCENT;
D O I
10.1109/ACCESS.2025.3527027
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The advancement of biomarkers is critically important at present, as lung cancer is a leading cause of death. In the present study, volatile organic compounds (VOCs) are considered as biomarkers to predict lung cancer. VOCs from seven different sources including breath, blood, urine, cell line, plerual fluid, cancer tissue and lung tissue are targeted to enhance the prediction reliability. Feature selection and models fusion have been focused on during this study. Five in-built and one proposed ensemble machine learning model have been utilised to investigate the different types of VOCs. The idea behind designing one ensemble model is to combine multiple individual models for better performance by using optimal feature sets. This reasoning led to the design of an ensemble model to predict breath VOCs. The AvNNet model has superior performance in predicting blood VOCs, cancer tissue VOCs, cell line VOCs, and urine VOCs compared to four other models, achieving accuracies of 70%, 80%, 70%, and 90% accordingly on the validation dataset. The Blackboost model achieved 90% accuracy on the validation dataset in its prediction of lung tissue VOCs. With 90% accuracy on a validation dataset, the random forest model predicts pleural fluid volatile organic compounds efficiently. When compared to individual models, the proposed ensemble model predicts breath VOCs more effectively and achieves 100% accuracy on the validation dataset.
引用
收藏
页码:9809 / 9820
页数:12
相关论文
共 50 条
  • [21] CuO-ZnO p-n junctions for accurate prediction of multiple volatile organic compounds aided by machine learning algorithms
    Kulkarni, Saraswati
    Ghosh, Ruma
    ANALYTICA CHIMICA ACTA, 2023, 1253
  • [22] Efficient Prediction of Cardiovascular Disease Using Machine Learning Algorithms With Relief and LASSO Feature Selection Techniques
    Ghosh, Pronab
    Azam, Sami
    Jonkman, Mirjam
    Karim, Asif
    Shamrat, F. M. Javed Mehedi
    Ignatious, Eva
    Shultana, Shahana
    Beeravolu, Abhijith Reddy
    De Boer, Friso
    IEEE ACCESS, 2021, 9 : 19304 - 19326
  • [23] The analysis of volatile organic compounds biomarkers for lung cancer in exhaled breath, tissues and cell lines
    Wang, Yishan
    Hu, Yanjie
    Wang, Di
    Yu, Kai
    Wang, Ling
    Zou, Yingchang
    Zhao, Cong
    Zhang, Xuanlang
    Wang, Ping
    Ying, Kejing
    CANCER BIOMARKERS, 2012, 11 (04) : 129 - 137
  • [24] Predictive performance of selected breath volatile organic carbon compounds in stage 1 lung cancer
    Smirnova, Ekaterina
    Mallow, Christopher
    Muschelli, John
    Shao, Yuan
    Thiboutot, Jeffrey
    Lam, Andres
    Rule, Ana M.
    Crainiceanu, Ciprian
    Yarmus, Lonny
    TRANSLATIONAL LUNG CANCER RESEARCH, 2022, 11 (06) : 1009 - +
  • [25] Calculated indices of volatile organic compounds (VOCs) in exhalation for lung cancer screening and early detection
    Chen, Xing
    Muhammad, Kanhar Ghulam
    Madeeha, Channa
    Fu, Wei
    Xu, Linxin
    Hu, Yanjie
    Liu, Jun
    Ying, Kejing
    Chen, Liying
    Yurievna, Gorlova Olga
    LUNG CANCER, 2021, 154 : 197 - 205
  • [26] A Comparative Study for Breast Cancer Prediction using Machine Learning and Feature Selection
    Dhanya, R.
    Paul, Irene Rose
    Akula, Sai Sindhu
    Sivakumar, Madhumathi
    Nair, Jyothisha J.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 1049 - 1055
  • [27] Feature selection and classification in breast cancer prediction using IoT and machine learning
    Gopal, V. Nanda
    Al-Turjman, Fadi
    Kumar, R.
    Anand, L.
    Rajesh, M.
    MEASUREMENT, 2021, 178
  • [28] Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer
    Yang, Ruiyuan
    Xiong, Xingyu
    Wang, Haoyu
    Li, Weimin
    FRONTIERS IN ONCOLOGY, 2022, 12
  • [29] Solar Flare Prediction Using Advanced Feature Extraction, Machine Learning, and Feature Selection
    Omar W. Ahmed
    Rami Qahwaji
    Tufan Colak
    Paul A. Higgins
    Peter T. Gallagher
    D. Shaun Bloomfield
    Solar Physics, 2013, 283 : 157 - 175
  • [30] Machine learning-based lung and colon cancer detection using deep feature extraction and ensemble learning
    Talukder, Md Alamin
    Islam, Md Manowarul
    Uddin, Md Ashraf
    Akhter, Arnisha
    Hasan, Khondokar Fida
    Moni, Mohammad Ali
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 205