Incorporating prior probabilities into high-dimensional classifiers

被引:3
|
作者
Hall, Peter [1 ]
Xue, Jing-Hao [2 ]
机构
[1] Univ Melbourne, Dept Math & Stat, Melbourne, Vic 3010, Australia
[2] UCL, Dept Stat Sci, London WC1E 6BT, England
关键词
Bagging; Bootstrap; Centroid-based classifier; Discrimination; Error rate; Genomics; Nearest-neighbour algorithms; Resampling; Support vector machine; GENE-EXPRESSION; CENTROID CLASSIFIER; MICROARRAY DATA; CANCER; RATIOS;
D O I
10.1093/biomet/asp081
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In standard parametric classifiers, or classifiers based on nonparametric methods but where there is an opportunity for estimating population densities, the prior probabilities of the respective populations play a key role. However, those probabilities are largely ignored in the construction of high-dimensional classifiers, partly because there are no likelihoods to be constructed or Bayes risks to be estimated. Nevertheless, including information about prior probabilities can reduce the overall error rate, particularly in cases where doing so is most important, i.e. when the classification problem is particularly challenging and error rates are not close to zero. In this paper we suggest a general approach to reducing error rate in this way, by using a method derived from Breiman's bagging idea. The potential improvements in performance are identified in theoretical and numerical work, the latter involving both applications to real data and simulations. The method is simple and explicit to apply, and does not involve choice of any tuning parameters.
引用
收藏
页码:31 / 48
页数:18
相关论文
共 50 条
  • [1] Probabilistic classifiers with high-dimensional data
    Kim, Kyung In
    Simon, Richard
    BIOSTATISTICS, 2011, 12 (03) : 399 - 412
  • [2] To aggregate or not to aggregate high-dimensional classifiers
    Xu, Cheng-Jian
    Hoefsloot, Huub C. J.
    Smilde, Age K.
    BMC BIOINFORMATICS, 2011, 12
  • [3] To aggregate or not to aggregate high-dimensional classifiers
    Cheng-Jian Xu
    Huub CJ Hoefsloot
    Age K Smilde
    BMC Bioinformatics, 12
  • [4] Incorporating Prior Biologic Information for High-Dimensional Rare Variant Association Studies
    Quintana, Melanie A.
    Schumacher, Fredrick R.
    Casey, Graham
    Bernstein, Jonine L.
    Li, Li
    Conti, David V.
    HUMAN HEREDITY, 2012, 74 (3-4) : 184 - 195
  • [5] Using graphs to analyze high-dimensional classifiers
    Melnik, O
    Pollack, J
    IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL III, 2000, : 425 - 430
  • [6] On rank distribution classifiers for high-dimensional data
    Samuel Makinde, Olusola
    JOURNAL OF APPLIED STATISTICS, 2020, 47 (13-15) : 2895 - 2911
  • [7] Geometric classifiers for high-dimensional noisy data
    Ishii, Aki
    Yata, Kazuyoshi
    Aoshima, Makoto
    JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 188
  • [8] Class-imbalanced classifiers for high-dimensional data
    Lin, Wei-Jiun
    Chen, James J.
    BRIEFINGS IN BIOINFORMATICS, 2013, 14 (01) : 13 - 26
  • [9] Development of biomarker classifiers from high-dimensional data
    Baek, Songjoon
    Tsai, Chen-An
    Chen, James J.
    BRIEFINGS IN BIOINFORMATICS, 2009, 10 (05) : 537 - 546
  • [10] Median-Based Classifiers for High-Dimensional Data
    Hall, Peter
    Titterington, D. M.
    Xue, Jing-Hao
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2009, 104 (488) : 1597 - 1608