A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection

被引:49
作者
Chen, Chien-Hsing [1 ]
机构
[1] Ling Tung Univ, Dept Informat Management, Taichung, Taiwan
关键词
Breast cancer diagnoses; Feature selection; Cluster analysis; Filter model; Wrapper model;
D O I
10.1016/j.asoc.2013.10.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Models based on data mining and machine learning techniques have been developed to detect the disease early or assist in clinical breast cancer diagnoses. Feature selection is commonly applied to improve the performance of models. There are numerous studies on feature selection in the literature, and most of the studies focus on feature selection in supervised learning. When class labels are absent, feature selection methods in unsupervised learning are required. However, there are few studies on these methods in the literature. Our paper aims to present a hybrid intelligence model that uses the cluster analysis techniques with feature selection for analyzing clinical breast cancer diagnoses. Our model provides an option of selecting a subset of salient features for performing clustering and comprehensively considers the use of most existing models that use all the features to perform clustering. In particular, we study the methods by selecting salient features to identify clusters using a comparison of coincident quantitative measurements. When applied to benchmark breast cancer datasets, experimental results indicate that our method outperforms several benchmark filter- and wrapper-based methods in selecting features used to discover natural clusters, maximizing the between-cluster scatter and minimizing the within-cluster scatter toward a satisfactory clustering quality. (C) 2013 Elsevier By. All rights reserved.
引用
收藏
页码:4 / 14
页数:11
相关论文
共 32 条
[11]   On clustering validation techniques [J].
Halkidi, M ;
Batistakis, Y ;
Vazirgiannis, M .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2001, 17 (2-3) :107-145
[12]   Multitype features coselection for web document clustering [J].
Huang, S ;
Chen, Z ;
Yu, Y ;
Ma, WY .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (04) :448-459
[13]  
Kaijun Wang, 2009, Data Science Journal, V8, P88, DOI 10.2481/dsj.007-020
[14]  
Karegowda A.G., 2010, International journal of Computer applications, V1, P13
[15]   Self organization of a massive document collection [J].
Kohonen, T ;
Kaski, S ;
Lagus, K ;
Salojärvi, J ;
Honkela, J ;
Paatero, V ;
Saarela, A .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2000, 11 (03) :574-585
[16]   SELF-ORGANIZED FORMATION OF TOPOLOGICALLY CORRECT FEATURE MAPS [J].
KOHONEN, T .
BIOLOGICAL CYBERNETICS, 1982, 43 (01) :59-69
[17]  
Kohonen T., 1989, Self-organization and associative memory, V3rd
[18]   Evaluation of stability of k-means cluster ensembles with respect to random initialization [J].
Kuncheva, Ludmila I. ;
Vetrov, Dmitry P. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (11) :1798-1808
[19]   Simultaneous feature selection and clustering using mixture models [J].
Law, MHC ;
Figueiredo, MAT ;
Jain, AK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (09) :1154-1166
[20]   Text clustering with feature selection by using statistical data [J].
Li, Yanjun ;
Luo, Congnan ;
Chung, Soon M. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (05) :641-652