On the selection of the globally optimal prototype subset for nearest-neighbor classification

被引:8
作者
Carrizosa, Emilio [1 ]
Martin-Barragan, Belen
Plastria, Frank
Morales, Dolores Romero
机构
[1] Univ Seville, Fac Matemat, Seville 41012, Spain
[2] Univ Carlos III Madrid, Dept Estadist, Madrid 28903, Spain
[3] Vrije Univ Brussels, Dept Math Operat Res Stat & Informat Syst Managem, MOSI, B-1050 Brussels, Belgium
[4] Univ Oxford, Said Sch Business, Oxford OX1 1HP, England
关键词
classification; optimal prototype subset; nearest neighbor; dissimilarities; integer programming; variable neighborhood search; missing values;
D O I
10.1287/ijoc.1060.0183
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The nearest-neighbor classifier has been shown to be a powerful tool for multiclass classification. We explore both theoretical properties and empirical behavior of a variant method, in which the nearest-neighbor rule is applied to a reduced set of prototypes. This set is selected a priori by fixing its cardinality and minimizing the empirical misclassification cost. In this way we alleviate the two serious drawbacks of the nearest-neighbor method: high storage requirements and time-consuming queries. Finding this reduced set is shown to be NP-hard. We provide mixed integer programming (MIP) formulations, which are theoretically compared and solved by a standard MIP solver for small problem instances. We show that the classifiers derived from these formulations are comparable to benchmark procedures. We solve large problem instances by a metaheuristic that yields good classification rules in reasonable time. Additional experiments indicate that prototype-based nearest-neighbor classifiers remain quite stable in the presence of missing values.
引用
收藏
页码:470 / 479
页数:10
相关论文
共 39 条
  • [1] ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES
    ALTSCHUL, SF
    BOGUSKI, MS
    GISH, W
    WOOTTON, JC
    [J]. NATURE GENETICS, 1994, 6 (02) : 119 - 129
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] The filtered nearest neighbor method for generating low-discrepancy sequences
    Bennett, MV
    Willemain, TR
    [J]. INFORMS JOURNAL ON COMPUTING, 2004, 16 (01) : 68 - 72
  • [4] Nearest prototype classifier designs: An experimental study
    Bezdek, JC
    Kuncheva, LI
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2001, 16 (12) : 1445 - 1473
  • [5] BLAKE C, 1998, UCI REPOSITORY MACH
  • [6] Advances in instance selection for instance-based learning algorithms
    Brighton, H
    Mellish, C
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2002, 6 (02) : 153 - 172
  • [7] CARRIZOSA E, 2005, RM02027 U MASSTR
  • [8] Cochran WG., 1963, SAMPLING TECHNIQUES
  • [9] NEAREST NEIGHBOR PATTERN CLASSIFICATION
    COVER, TM
    HART, PE
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) : 21 - +
  • [10] Cristianini N., 2000, Intelligent Data Analysis: An Introduction