The Real-World-Weight Cross-Entropy Loss Function: Modeling the Costs of Mislabeling

被引:488
作者
Ho, Yaoshiang [1 ]
Wookey, Samuel [1 ]
机构
[1] ThinkyAI Res, Los Angeles, CA 90027 USA
关键词
Machine learning; class imbalance; oversampling; undersampling; ethnic stereotypes; social bias; maximum likelihood estimation; cross-entropy; softmax;
D O I
10.1109/ACCESS.2019.2962617
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a new metric to measure goodness-of-fit for classifiers: the Real World Cost function. This metric factors in information about a real world problem, such as financial impact, that other measures like accuracy or F1 do not. This metric is also more directly interpretable for users. To optimize for this metric, we introduce the Real-World-Weight Cross-Entropy loss function, in both binary classification and single-label multiclass classification variants. Both variants allow direct input of real world costs as weights. For single-label, multiclass classification, our loss function also allows direct penalization of probabilistic false positives, weighted by label, during the training of a machine learning model. We compare the design of our loss function to the binary cross-entropy and categorical cross-entropy functions, as well as their weighted variants, to discuss the potential for improvement in handling a variety of known shortcomings of machine learning, ranging from imbalanced classes to medical diagnostic error to reinforcement of social bias. We create scenarios that emulate those issues using the MNIST data set and demonstrate empirical results of our new loss function. Finally, we discuss our intuition about why this approach works and sketch a proof based on Maximum Likelihood Estimation.
引用
收藏
页码:4806 / 4813
页数:8
相关论文
共 21 条
[1]  
Abadi Martin, 2016, Proceedings of OSDI '16: 12th USENIX Symposium on Operating Systems Design and Implementation. OSDI '16, P265
[2]  
Ayalalazaro, 2016, IS THER WAY KER APPL
[3]  
Bolukbasi T., 2016, ADV NEURAL INFORM PR, P4356, DOI DOI 10.5555/3157382.3157584
[4]  
Bridle J. S., 1990, Neurocomputing: Algorithms, architectures and applications, P227, DOI DOI 10.1007/978-3-642-76153-928
[5]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[6]  
Cui Yin, 2019, C COMP VIS PATT REC
[7]  
Dimitroff Georgi, 2013, P INT C REC ADV NAT, P207
[8]  
Eban E., 2017, P 20 INT C ART INT C
[9]  
Goodfellow I., 2016, DEEP LEARNING, P131
[10]   Bringing Diagnosis Into the Quality and Safety Equations [J].
Graber, Mark L. ;
Wachter, Robert M. ;
Cassel, Christine K. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2012, 308 (12) :1211-1212