Nearest-neighbor classification with categorical variables

被引:12
|
作者
Buttrey, SE [1 ]
机构
[1] USN, Postgrad Sch, Dept Operat Res Sb, Monterey, CA 93943 USA
关键词
optimal scaling; cross-validation; Fisher's criterion; choice of metric;
D O I
10.1016/S0167-9473(98)00032-2
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A technique is presented for adopting nearest-neighbor classification to the case of categorical variables. The set of categories is mapped onto the real line in such a way as to maximize the ratio of total sum of squares to within-class sum of squares, aggregated over classes. The resulting real values then replace the categories, and nearest-neighbor classification proceeds with the Euclidean metric on these new values. Continuous variables can be included in this scheme with little added efort. This approach has been implemented in a computer program and tried on a number of data sets, with encouraging results. Nearest-neighbor classification is a well-known and efective classification technique. With this scheme, an unknown item's distances to all known items are measured, and the unknown class is estimated by the class of the nearest neighbor or by the class most often represented among a set of nearest neighbors. This has proven effective in many examples, but an appropriate distance normalization is required when variables are scaled differently. For categorical variables "distance" is not even defined. In this paper categorical data values are replaced by real numbers in an optimal way: then those real numbers are used in nearest-neighbor classification (C) 1998 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:157 / 169
页数:13
相关论文
共 50 条
  • [1] CHOICE OF NEIGHBOR ORDER IN NEAREST-NEIGHBOR CLASSIFICATION
    Hall, Peter
    Park, Byeong U.
    Samworth, Richard J.
    ANNALS OF STATISTICS, 2008, 36 (05): : 2135 - 2152
  • [2] Prototype optimization for nearest-neighbor classification
    Huang, YS
    Chiang, CC
    Shieh, JW
    Grimson, E
    PATTERN RECOGNITION, 2002, 35 (06) : 1237 - 1245
  • [3] A Bayesian Reassessment of Nearest-Neighbor Classification
    Cucala, Lionel
    Marin, Jean-Michel
    Robert, Christian P.
    Titterington, D. M.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2009, 104 (485) : 263 - 273
  • [4] Nearest-neighbor classification for facies delineation
    Tartakovsky, Daniel M.
    Wohlberg, Brendt
    Guadagnini, Alberto
    WATER RESOURCES RESEARCH, 2007, 43 (07)
  • [5] In defense of Nearest-Neighbor based image classification
    Boiman, Oren
    Shechtman, Eli
    Irani, Michal
    2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 1992 - +
  • [6] Finding Relevant Points for Nearest-Neighbor Classification
    Eppstein, David
    2022 SYMPOSIUM ON SIMPLICITY IN ALGORITHMS, SOSA, 2022, : 68 - 78
  • [7] Locally adaptive metric nearest-neighbor classification
    Domeniconi, C
    Peng, J
    Gunopulos, D
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (09) : 1281 - 1285
  • [8] Fuzzy-rough nearest-neighbor classification approach
    Bian, HY
    Mazlack, L
    NAFIPS'2003: 22ND INTERNATIONAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS PROCEEDINGS, 2003, : 500 - 505
  • [9] Integrating background knowledge into nearest-neighbor text classification
    Zelikovitz, S
    Hirsh, H
    ADVANCES IN CASE-BASED REASONING, 2002, 2416 : 1 - 5
  • [10] Adaptive κ-nearest-neighbor classification using a dynamic number of nearest neighbors
    Ougiaroglou, Stefanos
    Nanopoulos, Alexandros
    Papadopoulos, Apostolos N.
    Manolopoulos, Yannis
    Welzer-Druzovec, Tatjana
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, PROCEEDINGS, 2007, 4690 : 66 - +