A mixed integer linear model for clustering with variable selection

被引:19
作者
Benati, Stefano [1 ]
Garcia, Sergio [2 ]
机构
[1] Univ Trento, Dipartimento Sociol & Ric Sociale, Trento, Italy
[2] Univ Kent, Kent Business Sch, Chatham, Kent, England
关键词
Clustering; p-median; Variable selection; Radius formulation; ALGORITHM;
D O I
10.1016/j.cor.2013.10.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper introduces an extension of the p-median problem in which the distance function between units is calculated as the distance sum on the q most important variables out of a set of size m. This model has applications in cluster analysis (for example, in sociological surveys), where analysts have a large list of variables available for inclusion, but only a subset of them (true variables) is appropriate for uncovering the cluster structure. Therefore, researchers must carefully separate the true variables from the other before computing data partitions. Here we show that this problem can be formulated as a mixed integer non-linear optimization model where clustering and variable selection are done simultaneously. Then we provide two different linearizations and compare their performance with the default method of clustering with all the variables (which is a p-median model) on a set of artificially generated binary data, showing that the model based on a radius formulation performs the best. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:280 / 285
页数:6
相关论文
共 22 条
[1]  
Benati S, 2012, P MEDIAN PROBLEM DIS
[3]   The Academic Journal Ranking Problem: A Fuzzy-Clustering Approach [J].
Benati, Stefano ;
Stefani, Silvana .
JOURNAL OF CLASSIFICATION, 2011, 28 (01) :7-20
[4]   Clustering binary data in the presence of masking variables [J].
Brusco, MJ .
PSYCHOLOGICAL METHODS, 2004, 9 (04) :510-523
[5]   A METHOD FOR COMPARING 2 HIERARCHICAL CLUSTERINGS [J].
FOWLKES, EB ;
MALLOWS, CL .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1983, 78 (383) :553-569
[6]   Clustering objects on subsets of attributes [J].
Friedman, JH ;
Meulman, JJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2004, 66 :815-839
[7]   New formulation and a branch-and-cut algorithm for the multiple allocation p-hub median problem [J].
Garcia, Sergio ;
Landete, Mercedes ;
Marin, Alfredo .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2012, 220 (01) :48-57
[8]   Solving Large p-Median Problems with a Radius Formulation [J].
Garcia, Sergio ;
Labbe, Martine ;
Marin, Alfredo .
INFORMS JOURNAL ON COMPUTING, 2011, 23 (04) :546-556
[9]  
Goldengorin B, 2013, HDB COMBINATORIAL OP, P929
[10]   Cluster analysis and mathematical programming [J].
Hansen, P ;
Jaumard, B .
MATHEMATICAL PROGRAMMING, 1997, 79 (1-3) :191-215