A Novel Feature Selection Method for Ultra High Dimensional Survival Data

被引:1
作者
Salma, Nahid [1 ,2 ]
Al-Rammahi, Ali Hussain Mohammed [1 ]
Ali, Majid Khan Majahar [1 ]
机构
[1] Univ Sains Malaysia, Sch Math Sci, Gelugor, Penang, Malaysia
[2] Jahangirnagar Univ, Dept Stat & Data Sci, Dhaka 1342, Bangladesh
来源
MALAYSIAN JOURNAL OF FUNDAMENTAL AND APPLIED SCIENCES | 2024年 / 20卷 / 05期
关键词
Ultra-high dimension; renal cell carcinoma; cox model; freund model; feature selection; VARIABLE SELECTION; CANCER; SHRINKAGE; PROGNOSIS; MODELS;
D O I
10.11113/mjfas.v20n5.3665
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Finding relevant features in ultra-high dimensional survival data is one of the most important and fundamental objectives in biology discovery and statistical acquisition. Conventional survival regression algorithms are challenged by the exponential increase in raw data. In real-world scenarios, data processing with ultra-high dimensionality has an impact, particularly on two-component structures like the kidneys, lungs, and eyes. Future system stability and the frequency of illness are both affected by gene interactions between two components. The traditional statistical procedures employed for the survival system are restricted to single component. To date, for ultra-high-dimensional survival data with two compartments, no feature selection method is available. Thus, with the goal to determine the optimal methods in this situation, this study suggested and contrasted the performance of ten variable selection approaches for ultra-high dimensional Renal Cell Carcinoma (RCC) survival data containing two compartments. The study attempted to combine Freund's baseline hazard function as the baseline hazard of Cox model (Lasso Freund, Robust Lasso Freund, Elastic Net Freund) and integrated with sure independence screening (SIS) and iterative sure independence screening (ISIS) (i.e., LF-SIS, RLF-SIS, ENF-SIS, LF-ISIS, RLF-ISIS, ENF-ISIS) in an attempt to tackle this issue. Additionally, two basic approaches, LASSO and EN, were taken into consideration and EN is combined with SIS and ISIS (EN-SIS, EN-ISIS). Result shows that based on the validating model measures, including MSE (340.000), SSE (25300.0) and RMSE (16.490) suggest, the Robust Lasso Freund- Iterative Sure Independence Screening (RLF-ISIS) and Robust Lasso Freund-Sure Independence Screening (RLF-SIS) strategy performs superior to the other suggested approaches in terms of greater precision in picking variables. Though both methods showed lower R2 2 (0.71) which advocates the presence of the outliers in the dataset. Additionally, the box-plot of some selected predictive genes confirms the presence of outliers. Furthermore, two methods, RLF-ISIS and RLF-SIS, have been used to identify 49 and 68 genes that have both direct and indirect effects on patients with RCC. Finally, it can be concluded that although RLF-SIS and RLF-ISIS outperform other proposed approaches, they may, however, be regarded as a variable selection strategy but they might not be the optimal choice for ultra-high dimensional survival data with outliers. Nevertheless, the study can be expanded in the future by applying competitive risk theory to a sequential and parallel structure, which serves as the basis for most complex mechanical systems found in manufacturing facilities. Notably, no feature selection method is available for ultra-high-dimensional survival data with outliers and two- compartments. Therefore, to address this particular issue, further research should focus on developing an advanced hybrid feature selection approach, with a particular emphasis on deep learning strategies.
引用
收藏
页码:1149 / 1171
页数:23
相关论文
共 70 条
[1]   High-dimensional feature selection for genomic datasets [J].
Afshar, Majid ;
Usefi, Hamid .
KNOWLEDGE-BASED SYSTEMS, 2020, 206
[2]  
AL-Rammahi A. H., 2022, AIP Conference Proceedings, V2398
[3]  
AL-Rammahi A. H., 2021, AIP Conference Proceedings, V2404
[4]   Tuning parameter estimation in SCAD-support vector machine using firefly algorithm with application in gene selection and cancer classification [J].
Al-Thanoon, Niam Abdulmunim ;
Qasim, Omar Saber ;
Algamal, Zakariya Yahya .
COMPUTERS IN BIOLOGY AND MEDICINE, 2018, 103 :262-268
[5]  
Autcha A., 2022, IST Transactions on Applied Mathematics & Modeling, DOI [10.9734/bpi/ist/v3/1695b, DOI 10.9734/BPI/IST/V3/1695B]
[6]   Risk Factors for the Comorbidity of Hypertension and Renal Cell Carcinoma in the Cardio-Oncologic Era and Treatment for Tumor-Induced Hypertension [J].
Ba, Zhengqing ;
Xiao, Ying ;
He, Ming ;
Liu, Dong ;
Wang, Hao ;
Liang, Hanyang ;
Yuan, Jiansong .
FRONTIERS IN CARDIOVASCULAR MEDICINE, 2022, 9
[7]  
Bhattacharjee Atanu, 2022, Healthcare Analytics, V2
[8]  
Candes E, 2007, ANN STAT, V35, P2313, DOI 10.1214/009053606000001523
[9]   Elastic net-based high dimensional data selection for regression [J].
Chamlal, Hasna ;
Benzmane, Asmaa ;
Ouaderhman, Tayeb .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
[10]   Identifying the novel key genes in renal cell carcinoma by bioinformatics analysis and cell experiments [J].
Chen, Yeda ;
Gu, Di ;
Wen, Yaoan ;
Yang, Shuxin ;
Duan, Xiaolu ;
Lai, Yongchang ;
Yang, Jianan ;
Yuan, Daozhang ;
Khan, Aisha ;
Wu, Wenqi ;
Zeng, Guohua .
CANCER CELL INTERNATIONAL, 2020, 20 (01)