A novel feature selection algorithm for identifying hub genes in lung cancer

被引:7
作者
Mohamed, Tehnan I. A. [1 ,3 ]
Ezugwu, Absalom E. [2 ]
Fonou-Dombeu, Jean Vincent [1 ]
Mohammed, Mohanad [1 ]
Greeff, Japie [4 ]
Elbashir, Murtada K. [5 ]
机构
[1] Univ KwaZulu Natal, Sch Math Stat & Comp Sci, King Edward Ave Pietermaritzburg Campus, ZA-3201 Pietermaritzburg, South Africa
[2] Northwest Univ, Unit Data Sci & Comp, Potchefstroom, South Africa
[3] Univ Gezira, Dept Comp Sci, Fac Math & Comp Sci, Wad Madani 11123, Sudan
[4] Northwest Univ, Sch Comp Sci & Informat Syst, Fac Nat & Agr Sci, Vanderbijlpark, South Africa
[5] Jouf Univ, Dept Informat Syst, Coll Comp & Informat Sci, Sakaka 72388, Saudi Arabia
关键词
BREAST-CANCER; CLASSIFICATION; NETWORK; SEARCH; TREE;
D O I
10.1038/s41598-023-48953-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Lung cancer, a life-threatening disease primarily affecting lung tissue, remains a significant contributor to mortality in both developed and developing nations. Accurate biomarker identification is imperative for effective cancer diagnosis and therapeutic strategies. This study introduces the Voting-Based Enhanced Binary Ebola Optimization Search Algorithm (VBEOSA), an innovative ensemble-based approach combining binary optimization and the Ebola optimization search algorithm. VBEOSA harnesses the collective power of the state-of-the-art classification models through soft voting. Moreover, our research applies VBEOSA to an extensive lung cancer gene expression dataset obtained from TCGA, following essential preprocessing steps including outlier detection and removal, data normalization, and filtration. VBEOSA aids in feature selection, leading to the discovery of key hub genes closely associated with lung cancer, validated through comprehensive protein-protein interaction analysis. Notably, our investigation reveals ten significant hub genes-ADRB2, ACTB, ARRB2, GNGT2, ADRB1, ACTG1, ACACA, ATP5A1, ADCY9, and ADRA1B-each demonstrating substantial involvement in the domain of lung cancer. Furthermore, our pathway analysis sheds light on the prominence of strategic pathways such as salivary secretion and the calcium signaling pathway, providing invaluable insights into the intricate molecular mechanisms underpinning lung cancer. We also utilize the weighted gene co-expression network analysis (WGCNA) method to identify gene modules exhibiting strong correlations with clinical attributes associated with lung cancer. Our findings underscore the efficacy of VBEOSA in feature selection and offer profound insights into the multifaceted molecular landscape of lung cancer. Finally, we are confident that this research has the potential to improve diagnostic capabilities and further enrich our understanding of the disease, thus setting the stage for future advancements in the clinical management of lung cancer. The VBEOSA source codes is publicly available at https://github.com/TEHNAN/VBEOSA-A-Novel-Feature-Selection-Algorithm-for-Identifying-hub-Genes-in-Lung-Cancer.
引用
收藏
页数:19
相关论文
共 56 条
[1]   A new feature selection method to improve the document clustering using particle swarm optimization algorithm [J].
Abualigah, Laith Mohammad ;
Khader, Ahamad Tajudin ;
Hanandeh, Essam Said .
JOURNAL OF COMPUTATIONAL SCIENCE, 2018, 25 :456-466
[2]   Colorectal Cancer Prediction Based on Weighted Gene Co-Expression Network Analysis and Variational Auto-Encoder [J].
Ai, Dongmei ;
Wang, Yuduo ;
Li, Xiaoxin ;
Pan, Hongfei .
BIOMOLECULES, 2020, 10 (09) :1-11
[3]   Binary Ebola Optimization Search Algorithm for Feature Selection and Classification Problems [J].
Akinola, Olatunji ;
Oyelade, Olaide N. ;
Ezugwu, Absalom E. .
APPLIED SCIENCES-BASEL, 2022, 12 (22)
[4]   Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review [J].
Alharbi, Fadi ;
Vakanski, Aleksandar .
BIOENGINEERING-BASEL, 2023, 10 (02)
[5]   Improved intelligent water drop-based hybrid feature selection method for data [J].
Alhenawi, Esra'a ;
Al-Sayyed, Rizik ;
Hudaib, Amjad ;
Mirjalili, Seyedali .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2023, 103
[6]   Feature selection methods on gene expression microarray data for cancer classification: A systematic review [J].
Alhenawi, Esra'a ;
Al-Sayyed, Rizik ;
Hudaib, Amjad ;
Mirjalili, Seyedali .
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 140
[7]   Hybrid Feature Selection of Breast Cancer Gene Expression Microarray Data Based on Metaheuristic Methods: A Comprehensive Review [J].
Ali, Nursabillilah Mohd ;
Besar, Rosli ;
Ab Aziz, Nor Azlina .
SYMMETRY-BASEL, 2022, 14 (10)
[8]   A Comprehensive Survey of Recent Hybrid Feature Selection Methods in Cancer Microarray Gene Expression Data [J].
Almazrua, Halah ;
Alshamlan, Hala .
IEEE ACCESS, 2022, 10 :71427-71449
[9]   A Survey on Hybrid Feature Selection Methods in Microarray Gene Expression Data for Cancer Classification [J].
Almugren, Nada ;
Alshamlan, Hala .
IEEE ACCESS, 2019, 7 :78533-78548
[10]  
[Anonymous], 2017, Mastering machine learning with Python in six steps