Maximizing sensitivity in medical diagnosis using biased minimax probability machine

被引:31
作者
Huang, KZ [1 ]
Yang, HQ
King, I
Lyu, MR
机构
[1] Fujitsu Res & Dev Ctr Co Ltd, Informat Technol Lab, Beijing 100016, Peoples R China
[2] Titanium Technol Ltd, Shenzhen 51820, Peoples R China
[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China
关键词
biased classification; medical diagnosis; minimax probability machine; worst case accuracy;
D O I
10.1109/TBME.2006.872819
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
The challenging task of medical diagnosis based on machine learning techniques requires an inherent bias, i.e., the diagnosis should favor the "ill" class over the "healthy" class, since misdiagnosing a patient as a healthy person may delay the therapy and aggravate the illness. Therefore, the objective in this task is not to improve the overall accuracy of the classification, but to focus on improving the sensitivity (the accuracy of the "ill" class) while maintaining an acceptable specificity (the accuracy of the "healthy" class). Some current methods adopt roundabout wavs to impose a certain bias toward the important class, i.e., they try to utilize some intermediate factors to influence the classification. However, it remains uncertain whether these methods can improve the classification performance systematically. In this paper, by engaging a novel learning tool, the biased minimax probability machine (BMPM), we deal with the issue in a more elegant way and directly achieve the objective of appropriate medical diagnosis. More specifically, the BMPM directly controls the worst case accuracies to incorporate a bias toward the "ill" class. Moreover, in a distribution-free way, the BMPM derives the decision rule in such a way as to maximize the worst case sensitivity while maintaining an acceptable worst case specificity. By directly controlling the accuracies, the BMPM provides a more rigorous way to handle medical diagnosis; by deriving a distribution-free decision rule, the BMPM distinguishes itself from a large family of classifiers, namely, the generative classifiers, where an assumption on the data distribution is necessary. We evaluate the performance of the model and compare it with three traditional classifiers: the k-nearest neighbor, the naive Bayesian, and the C4.5. The test results on two medical datasets, the breast-cancer dataset and the heart disease dataset, show that the BMPM outperforms the other three models.
引用
收藏
页码:821 / 831
页数:11
相关论文
共 35 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
[Anonymous], 1998, Encyclopedia of Biostatistics
[3]  
[Anonymous], ENCY STAT SCI
[4]  
BAIRAGI R, 1989, SANKHYA SER B, V51, P263
[5]  
Bertsekas D., 1999, NONLINEAR PROGRAMMIN
[6]  
Blake C.L., 1998, UCI repository of machine learning databases
[7]  
BREIMAN L, 1997, 460 U CAL ARC CLASS
[8]  
Cardie C., 1997, Proceedings of the Fourteenth International Conference on Machine Learning, P57
[9]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[10]   RECEIVER OPERATING CHARACTERISTIC RATING ANALYSIS - GENERALIZATION TO THE POPULATION OF READERS AND PATIENTS WITH THE JACKKNIFE METHOD [J].
DORFMAN, DD ;
BERBAUM, KS ;
METZ, CE .
INVESTIGATIVE RADIOLOGY, 1992, 27 (09) :723-731