Intra- and Inter-rater Agreement in a Subjective Speech Quality Assessment Task in Crowdsourcing

被引:4
|
作者
Jimenez, Rafael Zequeira [1 ]
Llagostera, Anna [2 ]
Naderi, Babak [1 ]
Moeller, Sebastian [3 ]
Berger, Jens [2 ]
机构
[1] Tech Univ Berlin, Berlin, Germany
[2] Rohde & Schwarz SwissQual AG, Zuchwil, Switzerland
[3] Tech Univ Berlin, DFKI Projektburo Berlin, Berlin, Germany
来源
COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ) | 2019年
关键词
inter-rater reliability; speech quality assessment; crowdsourcing; listeners' agreement; subjectivity in crowdsourcing;
D O I
10.1145/3308560.3317084
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Crowdsourcing is a great tool for conducting subjective user studies with large amounts of users. Collecting reliable annotations about the quality of speech stimuli is challenging. The task itself is of high subjectivity and users in crowdsourcing work without supervision. This work investigates the intra- and inter-listener agreement withing a subjective speech quality assessment task. To this end, a study has been conducted in the laboratory and in crowdsourcing in which listeners were requested to rate speech stimuli with respect to their overall quality. Ratings were collected on a 5-point scale in accordance with the ITU-T Rec. P.800 and P.808, respectively. The speech samples were taken from the database ITU-T Rec. P.501 Annex D, and were presented four times to the listeners. Finally, the crowdsourcing results were contrasted to the ratings collected in the laboratory. Strong and significant Spearman's correlation was achieved when contrasting the ratings collected in both environments. Our analysis show that while the inter-rater agreement increased the more the listeners conducted the assessment task, the intra-rater reliability remained constant. Our study setup helped to overcome the subjectivity of the task and we found that disagreement can represent a source of information to some extent.
引用
收藏
页码:1138 / 1143
页数:6
相关论文
共 50 条
  • [1] Debugging a Crowdsourced Task with Low Inter-Rater Agreement
    Alonso, Omar
    Marshall, Catherine C.
    Najork, Marc
    PROCEEDINGS OF THE 15TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL'15), 2015, : 101 - 110
  • [2] Inter-rater Agreement for Social Computing Studies
    Salminen, Joni O.
    Al-Merekhi, Hind A.
    Dey, Partha
    Jansen, Bernard J.
    2018 FIFTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2018, : 80 - 87
  • [3] Intra- and inter-rater reliability of digital image analysis for skin color measurement
    Sommers, Marilyn
    Beacham, Barbara
    Baker, Rachel
    Fargo, Jamison
    SKIN RESEARCH AND TECHNOLOGY, 2013, 19 (04) : 484 - 491
  • [4] Modeling Worker Performance Based on Intra-rater Reliability in Crowdsourcing A Case Study of Speech Quality Assessment
    Jimenez, Rafael Zequeira
    Llagostera, Anna
    Naderi, Babak
    Moeller, Sebastian
    Berger, Jens
    2019 ELEVENTH INTERNATIONAL CONFERENCE ON QUALITY OF MULTIMEDIA EXPERIENCE (QOMEX), 2019,
  • [5] Dual-energy CT in diagnosing sacral fractures: assessment of diagnostic accuracy and intra- and inter-rater reliabilities
    Oda, Takahiro
    Kitada, Shimpei
    Hirase, Hitoshi
    Iwasa, Kenjiro
    Niikura, Takahiro
    EUROPEAN JOURNAL OF TRAUMA AND EMERGENCY SURGERY, 2025, 51 (01)
  • [6] Influence of Number of Stimuli for Subjective Speech Quality Assessment in Crowdsourcing
    Jimenez, Rafael Zequeira
    Gallardo, Laura Fernandez
    Moeller, Sebastian
    2018 TENTH INTERNATIONAL CONFERENCE ON QUALITY OF MULTIMEDIA EXPERIENCE (QOMEX), 2018, : 19 - 24
  • [7] Intra-rater and inter-rater reliability of the rapid entire body assessment (REBA) tool
    Schwartz, Adam H.
    Albin, Thomas J.
    Gerberich, Susan G.
    INTERNATIONAL JOURNAL OF INDUSTRIAL ERGONOMICS, 2019, 71 : 111 - 116
  • [8] An Empirical Comparative Assessment of Inter-Rater Agreement of Binary Outcomes and Multiple Raters
    Konstantinidis, Menelaos
    Le, Lisa. W.
    Gao, Xin
    SYMMETRY-BASEL, 2022, 14 (02):
  • [9] Inter-rater agreement of observable and elicitable neurological signs
    Thaller, Mark
    Hughes, Thomas
    CLINICAL MEDICINE, 2014, 14 (03) : 264 - 267
  • [10] Detection of grey zones in inter-rater agreement studies
    Demirhan, Haydar
    Yilmaz, Ayfer Ezgi
    BMC MEDICAL RESEARCH METHODOLOGY, 2023, 23 (01)