Risk-sensitive loss functions for sparse multi-category classification problems

被引:92
作者
Suresh, S. [1 ]
Sundararajan, N. [1 ]
Saratchandran, P. [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 638768, Singapore
关键词
multi-category classification; neural network; risk-sensitive loss function; cross-entropy; satellite imaging; micro-array gene expression;
D O I
10.1016/j.ins.2008.02.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose two risk-sensitive loss functions to solve the multi-category classification problems where the number of training samples is small and/or there is a high imbalance in the number of samples per class. Such problems are common in the bio-informatics/medical diagnosis areas. The most commonly used loss functions in the literature do not perform well in these problems as they minimize only the approximation error and neglect the estimation error due to imbalance in the training set. The proposed risk-sensitive loss functions minimize both the approximation and estimation error. We present an error analysis for the risk-sensitive loss functions along with other well known loss functions. Using a neural architecture, classifiers incorporating these risk-sensitive loss functions have been developed and their performance evaluated for two real world multi-class classification problems, viz., a satellite image classification problem and a microarray gene expression based cancer classification problem. To study the effectiveness of the proposed loss functions, we have deliberately imbalanced the training samples in the satellite image problem and compared the performance of our neural classifiers with those developed using other well-known loss functions. The results indicate the superior performance of the neural classifier using the proposed loss functions both in terms of the overall and. per class classification accuracy. Performance comparisons have also been carried out on a number of benchmark problems where the data is normal i.e., not sparse or imbalanced. Results indicate similar or better performance of the proposed loss functions compared to the well-known loss functions. Published by Elsevier Inc.
引用
收藏
页码:2621 / 2638
页数:18
相关论文
共 31 条
  • [1] Allwein E. L., 2000, J MACHINE LEARNING R, V1, P113, DOI DOI 10.1162/15324430152733133
  • [2] Asuncion A., 2007, UCI MACHINE LEARNING
  • [3] An empirical comparison of voting classification algorithms: Bagging, boosting, and variants
    Bauer, E
    Kohavi, R
    [J]. MACHINE LEARNING, 1999, 36 (1-2) : 105 - 139
  • [4] Knowledge-based analysis of microarray gene expression data by using support vector machines
    Brown, MPS
    Grundy, WN
    Lin, D
    Cristianini, N
    Sugnet, CW
    Furey, TS
    Ares, M
    Haussler, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) : 262 - 267
  • [5] Christmann A, 2004, J MACH LEARN RES, V5, P1007
  • [6] Cichocki A., 1993, Neural Networks for Optimization and Signal Processing
  • [7] Cost functions to estimate a posteriori probabilities in multiclass problems
    Cid-Sueiro, J
    Arribas, JI
    Urbán-Muñoz, S
    Figueiras-Vidal, AR
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (03): : 645 - 656
  • [8] Cristianini N., 2000, Intelligent Data Analysis: An Introduction
  • [9] A decision-theoretic generalization of on-line learning and an application to boosting
    Freund, Y
    Schapire, RE
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) : 119 - 139
  • [10] NEURAL NETWORKS AND THE BIAS VARIANCE DILEMMA
    GEMAN, S
    BIENENSTOCK, E
    DOURSAT, R
    [J]. NEURAL COMPUTATION, 1992, 4 (01) : 1 - 58