Fuzzy measure with regularization for gene selection and cancer prediction

被引:10
作者
Wang, JinFeng [1 ]
He, ZhenYu [1 ]
Huang, ShuaiHui [1 ]
Chen, Hao [1 ]
Wang, WenZhong [2 ]
Pourpanah, Farhad [3 ]
机构
[1] South China Agr Univ, Coll Math & Informat, Guangzhou 510642, Peoples R China
[2] South China Agr Univ, Coll Econ & Management, Guangzhou 510642, Peoples R China
[3] Shenzhen Univ, Coll Math & Stat, Guangdong Key Lab Intelligent Informat Proc, Shenzhen 518060, Peoples R China
关键词
Gene selection; Fuzzy measure; Regularization; Cancer classification; SPARSE LOGISTIC-REGRESSION; MICROARRAY DATA; ADAPTIVE LASSO; CLASSIFICATION; REPRESENTATION;
D O I
10.1007/s13042-021-01319-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dealing with high-dimensional gene expression data is a challenging issue, and it is crucial to select multiple informative subsets of genes for cancer classification. In this regard, many statistical and machine learning methods with regulations have been developed. However, these methods neglected the epistasis, i.e., some genes may cover or affect other genes. In this article, we propose a fuzzy measure with regularization, which adopts L-1 and L-1/2 norms for sparse solutions, known as FMR, to describe the interaction between genes. Regularization with L-1 and L-1/2 can obtain a series of sparse solutions which help solving fuzzy measure quicker than traditional methods, such as Genetic Algorithm. FMR obtains a subset of genes corresponding to the fewest nonzero fuzzy measure values, and consequently, selects the important gene(s) according to the frequency of appearance in the selected gene subsets. Besides, three base classifiers, including SVM, KNN and DBN, are employed as underlying models to verify the effectiveness of the selected subset(s) of genes. Experimental results indicate that the selected genes by FMR are consistent with several clinical studies. In addition, it can produce comparable results in terms of accuracy as compared with other methods reported in the literature. The codes used in this article are freely available at: https://github.com/wangphoenix/ICMLC.
引用
收藏
页码:2389 / 2405
页数:17
相关论文
共 75 条
[1]  
Affymetrix, 2001, MICR SUIT US GUID VE
[2]  
Akaike H., 1998, Selected Papers of Hirotugu Akaike, P199, DOI [DOI 10.1007/978-1-4612-1694-015, 10.1007/978-1-4612-1694-0_15, DOI 10.1007/978-1-4612-1694-0_15]
[3]   Predicting Prostate Biopsy Results Using a Panel of Plasma and Urine Biomarkers Combined in a Scoring System [J].
Albitar, Maher ;
Ma, Wanlong ;
Lund, Lars ;
Albitar, Ferras ;
Diep, Kevin ;
Fritsche, Herbert A. ;
Shore, Neal .
JOURNAL OF CANCER, 2016, 7 (03) :297-303
[4]   Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification [J].
Algamal, Zakariya Yahya ;
Lee, Muhammad Hisyam .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (23) :9326-9332
[5]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[6]   Microarray gene expression classification with few genes: Criteria to combine attribute selection and classification methods [J].
Alonso-Gonzalez, Carlos J. ;
Isaac Moro-Sancho, Q. ;
Simon-Hurtado, Arancha ;
Varela-Arrabal, Ricardo .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) :7270-7280
[7]  
Baek, 2008, ANN COLOPROCTOL, V24, P337
[8]  
Bolstad BM, 2003, BIOINFORMATICS, V19, P185, DOI 10.1093/bioinformatics/19.2.185
[9]  
Cawley GC, 2006, BIOINFORMATICS, V22, P2348, DOI 10.1093/bioinformatics/btl386
[10]   Fusion of Multi-RSMOTE With Fuzzy Integral to Classify Bug Reports With an Imbalanced Distribution [J].
Chen, Rong ;
Guo, Shi-Kai ;
Wang, Xi-Zhao ;
Zhang, Tian-Lun .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2019, 27 (12) :2406-2420