Classifier performance assessment in social science - does the quality of data matter?

被引:0
作者
Stoklasa, Jan [1 ]
Luukka, Pasi [2 ,3 ]
Talasova, Jana [1 ]
机构
[1] Palacky Univ, Fac Sci, Dept Mathemet Anal & Applicat Math, Olomouc 77146, Czech Republic
[2] Lappeenranta Univ Technol, Sch Business, Lappeenranta 53850, Finland
[3] Lappeenranta Univ Technol, Deptartment Math & Phys, Lappeenranta 53850, Finland
来源
MATHEMATICAL METHODS IN ECONOMICS (MME 2014) | 2014年
关键词
classification; ROC; ROC curve; area under curve; AUC; quality;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
In social science it is common to deal with data of variable quality, as a direct consequence of having human beings as the main source of information. Various measures of data quality have been developed in humanities e.g. validation scales or lie scores. In many real life cases, decisions need to be made or diagnoses assigned based on data of lower quality when data of higher quality are not available - any information on its quality is therefore valuable. Real life decision making problems in social science can be translated into the language of classification. In mathematical classifier design theory various measures of classifier performance have been designed. As far as we know, none of these reflects directly the quality (reliability) of the data. On the example of the receiver operating characteristics (ROC) and the area under the ROC curve (AUC) introduced in signal detection theory we show, how data quality can be incorporated into performance assessment of classifiers. We present a modification of the ROC approach reflecting data quality and discuss possible benefits of its use on artificial data.
引用
收藏
页码:974 / 979
页数:6
相关论文
共 11 条
[1]  
[Anonymous], 1975, SIGNAL DETECTION THE
[2]  
[Anonymous], 2004, HPL20034
[3]  
[Anonymous], 2000, The Handbooks of Fuzzy Sets Series
[4]  
Green DM., 1966, SIGNAL DETECTION THE, V1, P1969
[5]  
Greene RL., 2000, The MMPI-2: An interpretive manual
[6]   A MULTIPHASIC PERSONALITY SCHEDULE (MINNESOTA): I. CONSTRUCTION OF THE SCHEDULE [J].
Hathaway, S. R. ;
McKinley, J. C. .
JOURNAL OF PSYCHOLOGY, 1940, 10 (02) :249-254
[7]  
Krzanowski WJ, 2009, MONOGR STAT APPL PRO, V111, P1
[8]  
Kuncheva L., 2000, FUZZY CLASSIFIER DES
[9]   Fuzzy signal detection theory: Basic postulates and formulas for analyzing human and machine performance [J].
Parasuraman, R ;
Masalonis, AJ ;
Hancock, PA .
HUMAN FACTORS, 2000, 42 (04) :636-659
[10]  
Stoklasa J, 2011, PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON MATHEMATICAL METHODS IN ECONOMICS 2011, PTS I AND II, P653