Incorporating prior probabilities into high-dimensional classifiers

被引:3
|
作者
Hall, Peter [1 ]
Xue, Jing-Hao [2 ]
机构
[1] Univ Melbourne, Dept Math & Stat, Melbourne, Vic 3010, Australia
[2] UCL, Dept Stat Sci, London WC1E 6BT, England
关键词
Bagging; Bootstrap; Centroid-based classifier; Discrimination; Error rate; Genomics; Nearest-neighbour algorithms; Resampling; Support vector machine; GENE-EXPRESSION; CENTROID CLASSIFIER; MICROARRAY DATA; CANCER; RATIOS;
D O I
10.1093/biomet/asp081
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In standard parametric classifiers, or classifiers based on nonparametric methods but where there is an opportunity for estimating population densities, the prior probabilities of the respective populations play a key role. However, those probabilities are largely ignored in the construction of high-dimensional classifiers, partly because there are no likelihoods to be constructed or Bayes risks to be estimated. Nevertheless, including information about prior probabilities can reduce the overall error rate, particularly in cases where doing so is most important, i.e. when the classification problem is particularly challenging and error rates are not close to zero. In this paper we suggest a general approach to reducing error rate in this way, by using a method derived from Breiman's bagging idea. The potential improvements in performance are identified in theoretical and numerical work, the latter involving both applications to real data and simulations. The method is simple and explicit to apply, and does not involve choice of any tuning parameters.
引用
收藏
页码:31 / 48
页数:18
相关论文
共 50 条
  • [21] Poster: Adversarial Examples for Classifiers in High-Dimensional Network Data
    Ahmed, Muhammad Ejaz
    Kim, Hyoungshick
    CCS'17: PROCEEDINGS OF THE 2017 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2017, : 2467 - 2469
  • [22] Hierarchical Decompositions for the Computation of High-Dimensional Multivariate Normal Probabilities
    Genton, Marc G.
    Keyes, David E.
    Turkiyyah, George
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2018, 27 (02) : 268 - 277
  • [23] Small Ball Probabilities for Linear Images of High-Dimensional Distributions
    Rudelson, Mark
    Vershynin, Roman
    INTERNATIONAL MATHEMATICS RESEARCH NOTICES, 2015, 2015 (19) : 9594 - 9617
  • [24] Towards the optimal feature selection in high-dimensional Bayesian network classifiers
    Pavlenko, T
    Hall, M
    von Rosen, D
    Andrushchenko, Z
    SOFT METHODOLOGY AND RANDOM INFORMATION SYSTEMS, 2004, : 613 - 620
  • [25] On fuzzy feature selection in designing fuzzy classifiers for high-dimensional data
    Mansoori E.G.
    Shafiee K.S.
    Evol. Syst., 4 (255-265): : 255 - 265
  • [26] Correction to: High-Dimensional Quadratic Classifiers in Non-sparse Settings
    Makoto Aoshima
    Kazuyoshi Yata
    Methodology and Computing in Applied Probability, 2022, 24 : 2263 - 2263
  • [27] Steganalysis in High-Dimensional Feature Space Using Selective Ensemble Classifiers
    Chen, Bin
    Feng, Guorui
    Li, Fengyong
    ADVANCES ON DIGITAL TELEVISION AND WIRELESS MULTIMEDIA COMMUNICATIONS, 2012, 331 : 9 - 14
  • [28] Decision Region Connectivity Analysis: A Method for Analyzing High-Dimensional Classifiers
    Ofer Melnik
    Machine Learning, 2002, 48 : 321 - 351
  • [29] Decision region connectivity analysis: A method for analyzing high-dimensional classifiers
    Melnik, O
    MACHINE LEARNING, 2002, 48 (1-3) : 321 - 351
  • [30] An ensemble of case-based classifiers for high-dimensional biological domains
    Arshadi, N
    Jurisica, I
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2005, 3620 : 21 - 34