Multiple similarly-well solutions exist for biomedical feature selection and classification problems

被引:18
作者
Liu, Jiamei [1 ]
Xu, Cheng [1 ]
Yang, Weifeng [1 ]
Shu, Yayun [1 ]
Zheng, Weiwei [2 ,3 ]
Zhou, Fengfeng [1 ,2 ,3 ]
机构
[1] Jilin Univ, Coll Software, Changchun 130012, Jilin, Peoples R China
[2] Jilin Univ, Minist Educ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China
[3] Jilin Univ, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China
来源
SCIENTIFIC REPORTS | 2017年 / 7卷
关键词
DETERMINISTIC ALGORITHM; EXPRESSION; PROGRESSION; ADENOMA; CANCER;
D O I
10.1038/s41598-017-13184-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Binary classification is a widely employed problem to facilitate the decisions on various biomedical big data questions, such as clinical drug trials between treated participants and controls, and genome-wide association studies (GWASs) between participants with or without a phenotype. A machine learning model is trained for this purpose by optimizing the power of discriminating samples from two groups. However, most of the classification algorithms tend to generate one locally optimal solution according to the input dataset and the mathematical presumptions of the dataset. Here we demonstrated from the aspects of both disease classification and feature selection that multiple different solutions may have similar classification performances. So the existing machine learning algorithms may have ignored a horde of fishes by catching only a good one. Since most of the existing machine learning algorithms generate a solution by optimizing a mathematical goal, it may be essential for understanding the biological mechanisms for the investigated classification question, by considering both the generated solution and the ignored ones.
引用
收藏
页数:10
相关论文
共 32 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Improving support vector machine classifiers by modifying kernel functions [J].
Amari, S ;
Wu, S .
NEURAL NETWORKS, 1999, 12 (06) :783-789
[3]   Support Vector Machines with the Ramp Loss and the Hard Margin Loss [J].
Brooks, J. Paul .
OPERATIONS RESEARCH, 2011, 59 (02) :467-479
[4]   A deterministic algorithm for constrained enumeration of transmembrane protein folds [J].
Brown, WM ;
Faulon, JL ;
Sale, K .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2005, 29 (02) :143-150
[5]  
Chiang DY, 2009, NAT METHODS, V6, P99, DOI [10.1038/nmeth.1276, 10.1038/NMETH.1276]
[6]   Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival [J].
Chiaretti, S ;
Li, XC ;
Gentleman, R ;
Vitale, A ;
Vignetti, M ;
Mandelli, F ;
Ritz, J ;
Foa, R .
BLOOD, 2004, 103 (07) :2771-2778
[7]  
Coppo R, 2017, PEDIATR NEPHROL, V32, P139, DOI 10.1007/s00467-016-3469-3
[8]   Extreme learning machine: algorithm, theory and applications [J].
Ding, Shifei ;
Zhao, Han ;
Zhang, Yanan ;
Xu, Xinzheng ;
Nie, Ru .
ARTIFICIAL INTELLIGENCE REVIEW, 2015, 44 (01) :103-115
[9]  
Dolz S., 2015, LEUKEMIA LYMPHOMA, V57, P1
[10]  
Feng R., 2016, IEEE T NEURAL NETW L