共 49 条
BIMSSA: enhancing cancer prediction with salp swarm optimization and ensemble machine learning approaches
被引:8
作者:
Panda, Pinakshi
[1
]
Bisoy, Sukant Kishoro
[1
]
Panigrahi, Amrutanshu
[2
]
Pati, Abhilash
[2
]
Sahu, Bibhuprasad
[3
]
Guo, Zheshan
[4
]
Liu, Haipeng
[5
]
Jain, Prince
[6
]
机构:
[1] CV Raman Global Univ, Dept Comp Sci & Engn, Bhubaneswar, Odisha, India
[2] Siksha O Anusandhan, Dept Comp Sci & Engn, Bhubaneswar, Odisha, India
[3] Vardhaman Coll Engn Autonomous, Dept Informat Technol, Hyderabad, Telangana, India
[4] Hainan Univ, Sch Biomed Engn, Key Lab Biomed Engn Hainan Prov, Sanya, Peoples R China
[5] Coventry Univ, Ctr Intelligent Healthcare, Coventry, England
[6] Parul Univ, Parul Inst Technol, Dept Mechatron Engn, Vadodara, Gujarat, India
关键词:
cancer prediction;
microarray data;
feature selection;
swarm intelligence;
ensemble learning;
GENE-EXPRESSION DATA;
ALGORITHM;
SELECTION;
DIAGNOSIS;
ENTROPY;
D O I:
10.3389/fgene.2024.1491602
中图分类号:
Q3 [遗传学];
学科分类号:
071007 ;
090102 ;
摘要:
Background Cancer rates are rising rapidly, causing global mortality. According to the World Health Organization (WHO), 9.9 million people died from cancer in 2020. Machine learning (ML) helps identify cancer early, reducing deaths. An ML-based cancer diagnostic model can use the patient's genetic information, such as microarray data. Microarray data are high dimensional, which can degrade the performance of the ML-based models. For this, feature selection becomes essential.Methods Swarm Optimization Algorithm (SSA), Improved Maximum Relevance and Minimum Redundancy (IMRMR), and Boruta form the basis of this work's ML-based model BIMSSA. The BIMSSA model implements a pipelined feature selection method to effectively handle high-dimensional microarray data. Initially, Boruta and IMRMR were applied to extract relevant gene expression aspects. Then, SSA was implemented to optimize feature size. To optimize feature space, five separate machine learning classifiers, Support Vector Machine (SVM), Random Forest (RF), Extreme Learning Machine (ELM), AdaBoost, and XGBoost, were applied as the base learners. Then, majority voting was used to build an ensemble of the top three algorithms. The ensemble ML-based model BIMSSA was evaluated using microarray data from four different cancer types: Adult acute lymphoblastic leukemia and Acute myelogenous leukemia (ALL-AML), Lymphoma, Mixed-lineage leukemia (MLL), and Small round blue cell tumors (SRBCT).Results In terms of accuracy, the proposed BIMSSA (Boruta + IMRMR + SSA) achieved 96.7% for ALL-AML, 96.2% for Lymphoma, 95.1% for MLL, and 97.1% for the SRBCT cancer datasets, according to the empirical evaluations.Conclusion The results show that the proposed approach can accurately predict different forms of cancer, which is useful for both physicians and researchers.
引用
收藏
页数:22
相关论文