Intra- and Inter-rater Agreement in a Subjective Speech Quality Assessment Task in Crowdsourcing

被引：4

作者：

Jimenez, Rafael Zequeira ^{[1
]}

Llagostera, Anna ^{[2
]}

Naderi, Babak ^{[1
]}

Moeller, Sebastian ^{[3
]}

Berger, Jens ^{[2
]}

机构：

[1] Tech Univ Berlin, Berlin, Germany

[2] Rohde & Schwarz SwissQual AG, Zuchwil, Switzerland

[3] Tech Univ Berlin, DFKI Projektburo Berlin, Berlin, Germany

来源：

COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ) | 2019年

关键词：

inter-rater reliability; speech quality assessment; crowdsourcing; listeners' agreement; subjectivity in crowdsourcing;

D O I：

10.1145/3308560.3317084

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Crowdsourcing is a great tool for conducting subjective user studies with large amounts of users. Collecting reliable annotations about the quality of speech stimuli is challenging. The task itself is of high subjectivity and users in crowdsourcing work without supervision. This work investigates the intra- and inter-listener agreement withing a subjective speech quality assessment task. To this end, a study has been conducted in the laboratory and in crowdsourcing in which listeners were requested to rate speech stimuli with respect to their overall quality. Ratings were collected on a 5-point scale in accordance with the ITU-T Rec. P.800 and P.808, respectively. The speech samples were taken from the database ITU-T Rec. P.501 Annex D, and were presented four times to the listeners. Finally, the crowdsourcing results were contrasted to the ratings collected in the laboratory. Strong and significant Spearman's correlation was achieved when contrasting the ratings collected in both environments. Our analysis show that while the inter-rater agreement increased the more the listeners conducted the assessment task, the intra-rater reliability remained constant. Our study setup helped to overcome the subjectivity of the task and we found that disagreement can represent a source of information to some extent.

引用

页码：1138 / 1143

页数：6

共 50 条

[41] Inter-rater and intra-rater reliability of isotonic exercise monitoring device for measuring active knee extension
Limsakul, Chonnanid
Sengchuai, Kiattisak
Duangsoithong, Rakkrit
Jindapetch, Nattha
Jaruenpunyasak, Jermphiphut
PEERJ, 2023, 11
[42] Visualizing Agreement: Bland-Altman Plots as a Supplement to Inter-Rater Reliability Indices
Barr, Brogan L.
McIntosh, Virginia V. W.
Britt, Eileen F.
Jordan, Jennifer
Carter, Janet D.
MEASUREMENT-INTERDISCIPLINARY RESEARCH AND PERSPECTIVES, 2024, 22 (02) : 175 - 187
[43] Impact of the Number of Votes on the Reliability and Validity of Subjective Speech Quality Assessment in the Crowdsourcing Approach
Naderi, Babak
Hossfeld, Tobias
Hirth, Matthias
Metzger, Florian
Moeller, Sebastian
Jimenez, Rafael Zequeira
2020 TWELFTH INTERNATIONAL CONFERENCE ON QUALITY OF MULTIMEDIA EXPERIENCE (QOMEX), 2020,
[44] Inter-rater and intra-rater reliability in the interpretation of MTI photoscreener photographs of native American preschool children
Mohan, KM
Miller, JM
Dobson, V
Harvey, EM
Sherrill, DL
OPTOMETRY AND VISION SCIENCE, 2000, 77 (09) : 473 - 482
[45] Intra-rater and Inter-rater Reliability of the Commander Pressure Algometer in Greek Patients With Chronic Neck Pain
Skordis, Charalampos
Liaskou, Christina
Papagiakoumou, Evangelia
Sotiropoulos, Spyridon
Plavoukou, Theodora
Karakasidou, Palina
Georgoudis, George
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (08)
[46] Comparison of Inter-Rater Reliability Techniques in Performance-Based Assessment
Mancar, Sinem Arslan
Gulleroglu, H. Deniz
INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION, 2022, 9 (02): : 515 - 533
[47] Inter-rater reliability of the Abbreviated Injury Scale scores in patients with severe head injury shows good inter-rater agreement but variability between countries. An inter-country comparison study
Amy C. Gunning
Menco J. S. Niemeyer
Mark van Heijl
Karlijn J. P. van Wessem
Ronald V. Maier
Zsolt J. Balogh
Luke P. H. Leenen
European Journal of Trauma and Emergency Surgery, 2023, 49 : 1183 - 1188
[48] Inter-rater reliability of the assessment of adverse drug reactions in the hospitalised elderly
Tangiisuran, B.
Auyeung, V.
Cheek, L.
Rajkumar, C.
Davies, G.
JOURNAL OF NUTRITION HEALTH & AGING, 2013, 17 (08) : 700 - 705
[49] Inter-rater reliability of the assessment of adverse drug reactions in the hospitalised elderly
B. Tangiisuran
V. Auyeung
L. Cheek
C. Rajkumar
J. Graham Davies
The journal of nutrition, health & aging, 2013, 17 : 700 - 705
[50] Inter-rater reliability of the Abbreviated Injury Scale scores in patients with severe head injury shows good inter-rater agreement but variability between countries. An inter-country comparison study
Gunning, Amy C.
Niemeyer, Menco J. S.
van Heijl, Mark
van Wessem, Karlijn J. P.
Maier, Ronald, V
Balogh, Zsolt J.
Leenen, Luke P. H.
EUROPEAN JOURNAL OF TRAUMA AND EMERGENCY SURGERY, 2023, 49 (03) : 1183 - 1188

← 1 2 3 4 5 →