Incorporating prior probabilities into high-dimensional classifiers

被引:3
|
作者
Hall, Peter [1 ]
Xue, Jing-Hao [2 ]
机构
[1] Univ Melbourne, Dept Math & Stat, Melbourne, Vic 3010, Australia
[2] UCL, Dept Stat Sci, London WC1E 6BT, England
关键词
Bagging; Bootstrap; Centroid-based classifier; Discrimination; Error rate; Genomics; Nearest-neighbour algorithms; Resampling; Support vector machine; GENE-EXPRESSION; CENTROID CLASSIFIER; MICROARRAY DATA; CANCER; RATIOS;
D O I
10.1093/biomet/asp081
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In standard parametric classifiers, or classifiers based on nonparametric methods but where there is an opportunity for estimating population densities, the prior probabilities of the respective populations play a key role. However, those probabilities are largely ignored in the construction of high-dimensional classifiers, partly because there are no likelihoods to be constructed or Bayes risks to be estimated. Nevertheless, including information about prior probabilities can reduce the overall error rate, particularly in cases where doing so is most important, i.e. when the classification problem is particularly challenging and error rates are not close to zero. In this paper we suggest a general approach to reducing error rate in this way, by using a method derived from Breiman's bagging idea. The potential improvements in performance are identified in theoretical and numerical work, the latter involving both applications to real data and simulations. The method is simple and explicit to apply, and does not involve choice of any tuning parameters.
引用
收藏
页码:31 / 48
页数:18
相关论文
共 50 条
  • [31] Scale adjustments for classifiers in high-dimensional, low sample size settings
    Chan, Yao-Ban
    Hall, Peter
    BIOMETRIKA, 2009, 96 (02) : 469 - 478
  • [32] Incorporating Texture Information into Dimensionality Reduction for High-Dimensional Images
    Vieth, A.
    Vilanova, A.
    Lelieveldt, B.
    Eisemann, E.
    Hollt, T.
    2022 IEEE 15TH PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS 2022), 2022, : 11 - 20
  • [33] High-dimensional regression with potential prior information on variable importance
    Stokell, Benjamin G.
    Shah, Rajen D.
    STATISTICS AND COMPUTING, 2022, 32 (03)
  • [34] Variance Prior Forms for High-Dimensional Bayesian Variable Selection
    Moran, Gemma E.
    Rockova, Veronika
    George, Edward I.
    BAYESIAN ANALYSIS, 2019, 14 (04): : 1091 - 1119
  • [35] Inference of Multiple High-Dimensional Networks with the Graphical Horseshoe Prior
    Busatto, Claudio
    Stingo, Francesco Claudio
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2025,
  • [36] In Nonparametric and High-Dimensional Models, Bayesian Ignorability is an Informative Prior
    Linero, Antonio R.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 2785 - 2798
  • [37] High-dimensional regression with potential prior information on variable importance
    Benjamin G. Stokell
    Rajen D. Shah
    Statistics and Computing, 2022, 32
  • [38] Rank-based classifiers for extremely high-dimensional gene expression data
    Ludwig Lausser
    Florian Schmid
    Lyn-Rouven Schirra
    Adalbert F. X. Wilhelm
    Hans A. Kestler
    Advances in Data Analysis and Classification, 2018, 12 : 917 - 936
  • [39] Improved shrunken centroid classifiers for high-dimensional class-imbalanced data
    Rok Blagus
    Lara Lusa
    BMC Bioinformatics, 14
  • [40] High-dimensional order parameters and neural network classifiers applied to amorphous ices
    Beaulieu, Zoe Faure
    Deringer, Volker L.
    Martelli, Fausto
    JOURNAL OF CHEMICAL PHYSICS, 2024, 160 (08):