A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection

被引:49
作者
Chen, Chien-Hsing [1 ]
机构
[1] Ling Tung Univ, Dept Informat Management, Taichung, Taiwan
关键词
Breast cancer diagnoses; Feature selection; Cluster analysis; Filter model; Wrapper model;
D O I
10.1016/j.asoc.2013.10.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Models based on data mining and machine learning techniques have been developed to detect the disease early or assist in clinical breast cancer diagnoses. Feature selection is commonly applied to improve the performance of models. There are numerous studies on feature selection in the literature, and most of the studies focus on feature selection in supervised learning. When class labels are absent, feature selection methods in unsupervised learning are required. However, there are few studies on these methods in the literature. Our paper aims to present a hybrid intelligence model that uses the cluster analysis techniques with feature selection for analyzing clinical breast cancer diagnoses. Our model provides an option of selecting a subset of salient features for performing clustering and comprehensively considers the use of most existing models that use all the features to perform clustering. In particular, we study the methods by selecting salient features to identify clusters using a comparison of coincident quantitative measurements. When applied to benchmark breast cancer datasets, experimental results indicate that our method outperforms several benchmark filter- and wrapper-based methods in selecting features used to discover natural clusters, maximizing the between-cluster scatter and minimizing the within-cluster scatter toward a satisfactory clustering quality. (C) 2013 Elsevier By. All rights reserved.
引用
收藏
页码:4 / 14
页数:11
相关论文
共 32 条
[1]  
[Anonymous], 1992, Decision Tree Construction via Linear Programming
[2]  
[Anonymous], J STAT COMPUTATION S
[3]  
[Anonymous], 2011, Pei. data mining concepts and techniques
[4]   FINE: Fisher Information Nonparametric Embedding [J].
Carter, Kevin M. ;
Raich, Raviv ;
Finn, William G. ;
Hero, Alfred O., III .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (11) :2093-U195
[5]  
Chen CH, 2011, LECT NOTES COMPUT SC, V6729, P269, DOI 10.1007/978-3-642-21524-7_32
[6]   A Logistic Regression Model Based on the National Mammography Database Format to Aid Breast Cancer Diagnosis [J].
Chhatwal, Jagpreet ;
Alagoz, Oguzhan ;
Lindstrom, Mary J. ;
Kahn, Charles E., Jr. ;
Shaffer, Katherine A. ;
Burnside, Elizabeth S. .
AMERICAN JOURNAL OF ROENTGENOLOGY, 2009, 192 (04) :1117-1127
[7]   A new feature selection scheme using a data distribution factor for unsupervised nominal data [J].
Chow, Tommy W. S. ;
Wang, Piyang ;
Ma, Eden W. M. .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (02) :499-509
[8]   Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information [J].
Chow, TWS ;
Huang, D .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2005, 16 (01) :213-224
[9]   CLUSTER SEPARATION MEASURE [J].
DAVIES, DL ;
BOULDIN, DW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) :224-227
[10]   Unsupervised feature selection applied to content-based retrieval of lung images [J].
Dy, JG ;
Brodley, CE ;
Kak, A ;
Broderick, LS ;
Aisen, AM .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (03) :373-378