Feature selection method based on mutual information and class separability for dimension reduction in multidimensional time series for clinical data

被引:54
作者
Fang, Liying [1 ,2 ,3 ]
Zhao, Han [1 ,2 ,3 ]
Wang, Pu [1 ,2 ,3 ]
Yu, Mingwei [4 ]
Yan, Jianzhuo [1 ,2 ,3 ]
Cheng, Wenshuai [1 ,2 ,3 ]
Chen, Peiyu [1 ,2 ,3 ]
机构
[1] Beijing Univ Technol, Coll Elect Informat & Control Engn, Beijing 100124, Peoples R China
[2] Minist Educ, Engn Res Ctr Digital Community, Beijing 100124, Peoples R China
[3] Beijing Key Lab Computat Intelligence & Intellige, Beijing 100124, Peoples R China
[4] CPUMS, Hosp Tradit Chinese Med, Beijing 100010, Peoples R China
关键词
Multidimensional time series; Dimension reduction; Feature selection; Mutual information; Class separability; CLASSIFICATION; VARIABLES;
D O I
10.1016/j.bspc.2015.05.011
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
In clinical medicine, multidimensional time series data can be used to find the rules of disease progress by data mining technology, such as classification and prediction. However, in multidimensional time series data mining problems, the excessive data dimension causes the inaccuracy of probability density distribution to increase the computational complexity. Besides, information redundancy and irrelevant features may lead to high computational complexity and over-fitting problems. The combination of these two factors can reduce the classification performance. To reduce computational complexity and to eliminate information redundancies and irrelevant features, we improved upon a multidimensional time series feature selection method to achieve dimension reduction. The improved method selects features through the combination of the Kozacbenko-Leonenko (K-L) information entropy estimation method for feature extraction based on mutual information and the feature selection algorithm based on class separability. We performed experiments on the Electroencephalogram (EEG) dataset for verification and the non-small cell lung cancer (NSCLC) clinical dataset for application. The results show that with the comparison of CLeVer, Corona and AGV, respectively, the improved method can effectively reduce the dimensions of multidimensional time series for clinical data. (C) 2015 The Authors. Published by Elsevier Ltd.
引用
收藏
页码:82 / 89
页数:8
相关论文
共 30 条
[1]   Presenting a new search strategy to select synchronization values for classifying bipolar mood disorders from schizophrenic patients [J].
Alimardani, F. ;
Boostani, R. ;
Azadehdel, M. ;
Ghanizadeh, A. ;
Rastegar, K. .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (02) :913-923
[2]  
[Anonymous], INTRO PATTERN RECOGN
[3]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[4]  
Chaovalitwongse WA, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P113
[5]  
Chonghua Wan C.Z.Y.S., 2000, CHINA CANC, V9, P109
[6]  
Dias N.S., 2009, NEUR ENG 2009 NER 09
[7]   A review on time series data mining [J].
Fu, Tak-chung .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2011, 24 (01) :164-181
[8]   K nearest neighbours with mutual information for simultaneous classification and missing data imputation [J].
Garcia-Laencina, Pedro J. ;
Sancho-Gomez, Jose-Luis ;
Figueiras-Vidal, Anibal R. ;
Verleysen, Michel .
NEUROCOMPUTING, 2009, 72 (7-9) :1483-1493
[9]  
Gokmen Z., 2012, International Journal of Computer Science Issues, V9, P355
[10]   Feature selection techniques with class separability for multivariate time series [J].
Han, Min ;
Liu, Xiaoxin .
NEUROCOMPUTING, 2013, 110 :29-34