Towards Rough Set Theory for Outliers Detection in Questionnaire Data

被引:0
作者
Uher, Vojtech [1 ]
Drazdilova, Pavla [1 ]
机构
[1] VSB Tech Univ Ostrava, Dept Comp Sci, 17 Listopadu 15-2172, Ostrava 70833, Czech Republic
来源
COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT, CISIM 2023 | 2023年 / 14164卷
关键词
Outliers detection; Questionnaire data; Rough set theory; HBSC;
D O I
10.1007/978-3-031-42823-4_23
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Manual processing of questionnaire surveys takes a lot of time and effort. This article aims at the automatic detection of corrupted or inappropriate responses in questionnaire data using unsupervised outliers detection methods. Unlike numerical data, which are usually assessed by distance-based methods, the entries in questionnaires need to be assessed from multiple perspectives. This paper proposes a novel algorithm utilizing the rough sets that capture relations among attributes/questions. The rough set theory is based on the granularity of data and is used to find combinations of attributes identifying the discernible questionnaires. The method is compared with standard and recent outlier detection algorithms that are based on distance, entropy, correlation, and probability. The tests are computed on the real-world HBSC dataset using several experiments. The rough set score computed on combinations of three attributes is preferred as it returns significant outliers that even reflect multiple perspectives investigated by other types of methods.
引用
收藏
页码:310 / 324
页数:15
相关论文
共 32 条
[1]  
Aggarwal Charu C., 2015, Acm sigkdd explorations newsletter, V17, P24
[2]  
[Anonymous], 1991, Theoretical Aspects of Reasoning about Data
[3]   After the bell: adolescents' organised leisure-time activities and well-being in the context of social and socioeconomic inequalities [J].
Badura, Petr ;
Hamrik, Zdenek ;
Dierckens, Maxim ;
Gobina, Inese ;
Malinowska-Cieslik, Marta ;
Furstova, Jana ;
Kopcakova, Jaroslava ;
Pickett, William .
JOURNAL OF EPIDEMIOLOGY AND COMMUNITY HEALTH, 2021, 75 (07) :628-636
[4]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[5]  
Chandola V., 2007, ACM Comput. Surv., V14, DOI DOI 10.1145/1541880.1541882
[6]  
Cronbach LJ, 1951, PSYCHOMETRIKA, V16, P297
[7]  
Garcia S, 2015, INTEL SYST REF LIBR, V72, P1, DOI 10.1007/978-3-319-10247-4
[8]  
Hawkins D. M., 1980, Identification of outliers
[9]   An adjusted boxplot for skewed distributions [J].
Hubert, M. ;
Vandervieren, E. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (12) :5186-5201
[10]  
Inchley J, 2018, Health behaviour in school-aged children (HBSC) study protocol: background, methodology and mandatory items for the 2017/18 survey