A distributed sparse logistic regression with L1/2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_{1/2}$$\end{document} regularization for microarray biomarker discovery in cancer classification

被引:0
作者
Ning Ai
Ziyi Yang
Haoliang Yuan
Dong Ouyang
Rui Miao
Yuhan Ji
Yong Liang
机构
[1] Macau University of Science and Technology,School of Computer Science and Engineering, Faculty of Innovation Engineering
[2] Tencent Quantum Lab,School of Automation
[3] Guangdong University of Technology,undefined
[4] Peng Cheng Laboratory,undefined
关键词
Microarray data integration; regularized logistic regression; ADMM algorithm; Cancer classification; Gene selection;
D O I
10.1007/s00500-022-07551-5
中图分类号
学科分类号
摘要
Microarray is a high-throughput sequencing technology, which can be used to classify cancer types and select the highly relevant cancer biomarkers (i.e., genes). To improve the availability of ever-increasing microarray data, data-integrative analysis becomes a hot research direction. However, the complexity of gene expression data still brings many challenges to the data integration methods: (1) the relevant biomarker selection in multiple high-dimensional datasets; (2) the batch effects between datasets; (3) the high noise in features and samples; (4) the large-scale data analysis with high computational cost. To overcome these challenges, we propose a novel Distribute-based Biological data-Integrative Analysis model—DBIA. DBIA is based on the L1/2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_{1/2}$$\end{document} regularized logistic regression (L1/2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_{1/2}$$\end{document} LR) model and the alternating direction multiplication algorithm (ADMM) for data integration. The regularization model is an effective method for selecting latent cancer-relevant genes and improving the accuracy of cancer classification. Moreover, we adopt the L1/2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_{1/2}$$\end{document} LR model to reduce the noise and dimensionality of the data. ADMM is employed to reduce the batch effects between datasets, analyze multiple datasets in parallel, and save the computational cost of large-scale data analysis. Experimental results on the simulation and real-world datasets demonstrate that DBIA achieves the good prediction performance with a shorter time, lower hardware requirements, and strong robustness. The genes selected by DBIA have a certain biological significance.
引用
收藏
页码:2537 / 2552
页数:15
相关论文
共 188 条
[1]  
Abeel T(2010)Robust biomarker identification for cancer diagnosis with ensemble feature selection methods Bioinformatics 26 392-398
[2]  
Helleputte T(2019)A survey on hybrid feature selection methods in microarray gene expression data for cancer classification IEEE Access 7 78533-78548
[3]  
Van de Peer Y(2019)The methylation detection and clinical significance of prdm2, prdm5 and prdm16 in breast cancer J Clin Expe Med 18 283-287
[4]  
Dupont P(2015)Part 1: simple definition and calculation of accuracy, sensitivity and specificity Emergency 3 48-49
[5]  
Saeys Y(2012)NCBI GEO: archive for functional genomics data sets-update Nucleic Acids Res 41 D991-D995
[6]  
Almugren N(2018)Affinity proteomic profiling of plasma for proteins associated to area-based mammographic breast density Breast Cancer Res 20 1-13
[7]  
Alshamlan H(2019)Multiobjective feature selection for microarray data via distributed parallel algorithms Futur Gener Comput Syst 100 952-981
[8]  
Bai J(2015)Microarray meta-analysis and cross-platform normalization: integrative genomics for robust biomarker discovery Microarrays 4 389-406
[9]  
Zhang Y(2019)An experimental comparison of feature-selection and classification methods for microarray datasets Information 10 109-214
[10]  
Kang N(2019)A survey of neural network-based cancer prediction models from microarray data Artif Intell Med 97 204-910