Feature Selection for the Classification of Longitudinal Human Ageing Data

被引:7
作者
Pomsuwan, Tossapol [1 ]
Freitas, Alex A. [1 ]
机构
[1] Univ Kent, Sch Comp, Canterbury, Kent, England
来源
2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017) | 2017年
关键词
classification; feature selection; longitudinal data; age-related diseases;
D O I
10.1109/ICDMW.2017.102
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new variant of the Correlation-based Feature Selection (CFS) method for coping with longitudinal data - where variables are repeatedly measured across different time points. The proposed CFS variant is evaluated on ten datasets created using data from the English Longitudinal Study of Ageing (ELSA), with different age-related diseases used as the class variables to be predicted. The results show that, overall, the proposed CFS variant leads to better predictive performance than the standard CFS and the baseline approach of no feature selection, when using Naive Bayes and J48 decision tree induction as classification algorithms (although the difference in performance is very small in the results for J4.8). We also report the most relevant features selected by J48 across the datasets.
引用
收藏
页码:739 / 746
页数:8
相关论文
共 13 条
[1]  
[Anonymous], 1998, Feature Extraction, Construction and Selection: A Data Mining Perspective
[2]   Combining feature selection and DTW for time-varying functional genomics [J].
Furlanello, Cesare ;
Merler, Stefano ;
Jurman, Giuseppe .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (06) :2436-2443
[3]  
Hall M., 2009, SIGKDD EXPLORATIONS, V11, P10, DOI [DOI 10.1145/1656274.1656278, 10.1145/1656274.1656278]
[4]  
Hall M.A., 2000, Machine Learning Proc Seventeenth International Conference on Machine Learning, P1
[5]  
Japkowicz N., 2011, EVALUATING LEARNING, P423
[6]  
Li J., 2016, FEATURE SELECTION DA
[7]   Analysis of Temporal High-Dimensional Gene Expression Data for Identifying Informative Biomarker Candidates [J].
Lou, Qiang ;
Obradovic, Zoran .
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, :996-1001
[8]  
NatCen Social Research, 2016, ENCY GEROPSYCHOLOGY
[9]   Minimum redundancy maximum relevance feature selection approach for temporal gene expression data [J].
Radovic, Milos ;
Ghalwash, Mohamed ;
Filipovic, Nenad ;
Obradovic, Zoran .
BMC BIOINFORMATICS, 2017, 18
[10]   A revision and analysis of the comprehensiveness of the main longitudinal studies of human aging for data mining research [J].
Ribeiro, Caio Eduardo ;
Brito, Luis Henrique S. ;
Nobre, Cristiane Neri ;
Freitas, Alex A. ;
Zarate, Luis Enrique .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 7 (03)