Selection of Variables for Cluster Analysis and Classification Rules

被引:46
作者
Fraiman, Ricardo [1 ,2 ]
Justel, Ana [3 ]
Svarc, Marcela [1 ]
机构
[1] Univ San Andres, Buenos Aires, DF, Argentina
[2] Univ Republica, Ctr Matemat, Montevideo, Uruguay
[3] Univ Autonoma Madrid, Madrid, Spain
关键词
Finding relevant variables; Forward-backward algorithm; Pattern recognition;
D O I
10.1198/016214508000000544
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this article we introduce two procedures for variable selection in cluster analysis and classification rules. One is mainly aimed at detecting the ''noisy'' noninformative variables, while the other also deals with multicolinearity and general dependence. Both methods are designed to be used after a ''satisfactory'' grouping procedure has been carried out. A forward-backward algorithm is proposed to make such procedures feasible in large datasets. A small simulation is performed and some real data examples are analyzed.
引用
收藏
页码:1294 / 1303
页数:10
相关论文
共 25 条
[1]  
[Anonymous], 1975, CLUSTERING ALGORITHM
[2]   ASYMPTOTIC-DISTRIBUTION OF SMOOTHERS BASED ON LOCAL MEANS AND LOCAL MEDIANS UNDER DEPENDENCE [J].
BOENTE, G ;
FRAIMAN, R .
JOURNAL OF MULTIVARIATE ANALYSIS, 1995, 54 (01) :77-90
[3]   A variable-selection heuristic for K-means clustering [J].
Brusco, MJ ;
Cradit, JD .
PSYCHOMETRIKA, 2001, 66 (02) :249-270
[4]   HlNoV: A new model to improve market segment definition by identifying noisy variables [J].
Carmone, FJ ;
Kara, A ;
Maxwell, S .
JOURNAL OF MARKETING RESEARCH, 1999, 36 (04) :501-509
[5]   Impartial trimmed k-means for functional data [J].
Cuesta-Albertos, Juan Antonio ;
Fraiman, Ricardo .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (10) :4864-4877
[6]  
Dy JG, 2004, J MACH LEARN RES, V5, P845
[7]   VARIABLE SELECTION IN CLUSTERING [J].
FOWLKES, EB ;
GNANADESIKAN, R ;
KETTENRING, JR .
JOURNAL OF CLASSIFICATION, 1988, 5 (02) :205-228
[8]   Model-based clustering, discriminant analysis, and density estimation [J].
Fraley, C ;
Raftery, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) :611-631
[9]  
FRIEDMAN JH, 2004, CLUSTERING OBJECTS S
[10]  
Green PJ, 1995, BIOMETRIKA, V82, P711, DOI 10.2307/2337340