highMLR: An open-source package for R with machine learning for feature selection in high dimensional cancer clinical genome time to event data

被引:2
作者
Bhattacharjee A. [1 ,2 ]
Vishwakarma G.K. [3 ]
Banerjee S. [3 ]
Pashchenko A.F. [4 ]
机构
[1] Section of Biostatistics, Centre for Cancer Epidemiology, Tata Memorial Centre
[2] Homi Bhabha National Institute, Mumbai
[3] Department of Mathematics & Computing, Indian Institute of Technology Dhanbad, Dhanbad
[4] Laboratory of Intellectual Control Systems and simulation, V. A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences
关键词
Feature selection; Gene expression; High dimension; Machine learning; Survival data;
D O I
10.1016/j.eswa.2022.118432
中图分类号
学科分类号
摘要
Machine learning techniques, popularly used as a tool for dimensionality reduction and pattern recognition of features, have been utilized extensively in data mining. In survival analysis, where the primary outcome is the time until a specific event occurs, identifying relevant features for building an efficient prediction model is essential. This is where machine learning can be a suitable option. However, there is an existing gap in utilizing machine learning techniques in high-dimensional survival data due to the non-availability of convenient programming functions and packages. In this article, we have developed an efficient machine learning procedure for analyzing survival data associated with high-dimensional gene expressions. Though there are several R libraries available for performing machine learning, no package support is available to implement machine learning with classification on high-dimensional survival data. highMLR, our developed R package, is capable of implementing machine learning methods on high dimensional survival data and provides a way of feature selection based on the logarithmic loss function. Several statistical methods for survival analysis have been incorporated into this machine learning algorithm. A high-dimensional gene expression dataset has been analyzed using the proposed R library to show its efficacy in feature selection. © 2022 Elsevier Ltd
引用
收藏
相关论文
共 36 条
  • [1] Abdullah A., Hussain A., Biclustering gene expression data in the presence of noise, International conference on artificial neural networks, pp. 611-616, (2005)
  • [2] Alaa A.M., van der Schaar M., Prognostication and risk factors for cystic fibrosis via automated machine learning, Scientific Reports, 8, 1, pp. 1-19, (2018)
  • [3] Aram P., Trela-Larsen L., Sayers A., Hills A.F., Blom A.W., McCloskey E.V., Et al., Estimating an individual's probability of revision surgery after knee replacement: a comparison of modeling approaches using a national data set, American Journal of Epidemiology, 187, 10, pp. 2252-2262, (2018)
  • [4] Bhattacharjee A., Rajendra J., Dikshit R., Dutt S., Her2 borderline is a negative prognostic factor for primary malignant breast cancer, Breast Cancer Research and Treatment, 181, pp. 225-231, (2020)
  • [5] Bhattacharjee A., Vishwakarma G.K., Banerjee S., A modified risk detection approach of biomarkers by frailty effect on multiple time to event data, (2020)
  • [6] Bhattacharjee A., Vishwakarma G.K., Banerjee S., Shukla S., Disease progression of cancer patients during COVID-19 pandemic: a comprehensive analytical strategy by time-dependent modelling, BMC Medical Research Methodology, 20, 1, pp. 1-7, (2020)
  • [7] Chin L., Gray J.W., Translating insights from the cancer genome into clinical practice, Nature, 452, 7187, pp. 553-563, (2008)
  • [8] Cox D.R., Regression models and life-tables, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 34, 2, pp. 187-202, (1972)
  • [9] Cox D.R., Oakes D., Analysis of survival data, (2018)
  • [10] Ding S., Keal C.A., Zhao L., Yu D., Dimensionality reduction and classification for hyperspectral image based on robust supervised ISOMAP, Journal of Industrial and Production Engineering, 39, 1, pp. 19-29, (2022)