Intra- and Inter-rater Agreement in a Subjective Speech Quality Assessment Task in Crowdsourcing

被引:4
|
作者
Jimenez, Rafael Zequeira [1 ]
Llagostera, Anna [2 ]
Naderi, Babak [1 ]
Moeller, Sebastian [3 ]
Berger, Jens [2 ]
机构
[1] Tech Univ Berlin, Berlin, Germany
[2] Rohde & Schwarz SwissQual AG, Zuchwil, Switzerland
[3] Tech Univ Berlin, DFKI Projektburo Berlin, Berlin, Germany
来源
COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ) | 2019年
关键词
inter-rater reliability; speech quality assessment; crowdsourcing; listeners' agreement; subjectivity in crowdsourcing;
D O I
10.1145/3308560.3317084
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Crowdsourcing is a great tool for conducting subjective user studies with large amounts of users. Collecting reliable annotations about the quality of speech stimuli is challenging. The task itself is of high subjectivity and users in crowdsourcing work without supervision. This work investigates the intra- and inter-listener agreement withing a subjective speech quality assessment task. To this end, a study has been conducted in the laboratory and in crowdsourcing in which listeners were requested to rate speech stimuli with respect to their overall quality. Ratings were collected on a 5-point scale in accordance with the ITU-T Rec. P.800 and P.808, respectively. The speech samples were taken from the database ITU-T Rec. P.501 Annex D, and were presented four times to the listeners. Finally, the crowdsourcing results were contrasted to the ratings collected in the laboratory. Strong and significant Spearman's correlation was achieved when contrasting the ratings collected in both environments. Our analysis show that while the inter-rater agreement increased the more the listeners conducted the assessment task, the intra-rater reliability remained constant. Our study setup helped to overcome the subjectivity of the task and we found that disagreement can represent a source of information to some extent.
引用
收藏
页码:1138 / 1143
页数:6
相关论文
共 50 条
  • [41] Inter-rater and intra-rater reliability of isotonic exercise monitoring device for measuring active knee extension
    Limsakul, Chonnanid
    Sengchuai, Kiattisak
    Duangsoithong, Rakkrit
    Jindapetch, Nattha
    Jaruenpunyasak, Jermphiphut
    PEERJ, 2023, 11
  • [42] Visualizing Agreement: Bland-Altman Plots as a Supplement to Inter-Rater Reliability Indices
    Barr, Brogan L.
    McIntosh, Virginia V. W.
    Britt, Eileen F.
    Jordan, Jennifer
    Carter, Janet D.
    MEASUREMENT-INTERDISCIPLINARY RESEARCH AND PERSPECTIVES, 2024, 22 (02) : 175 - 187
  • [43] Impact of the Number of Votes on the Reliability and Validity of Subjective Speech Quality Assessment in the Crowdsourcing Approach
    Naderi, Babak
    Hossfeld, Tobias
    Hirth, Matthias
    Metzger, Florian
    Moeller, Sebastian
    Jimenez, Rafael Zequeira
    2020 TWELFTH INTERNATIONAL CONFERENCE ON QUALITY OF MULTIMEDIA EXPERIENCE (QOMEX), 2020,
  • [44] Inter-rater and intra-rater reliability in the interpretation of MTI photoscreener photographs of native American preschool children
    Mohan, KM
    Miller, JM
    Dobson, V
    Harvey, EM
    Sherrill, DL
    OPTOMETRY AND VISION SCIENCE, 2000, 77 (09) : 473 - 482
  • [45] Intra-rater and Inter-rater Reliability of the Commander Pressure Algometer in Greek Patients With Chronic Neck Pain
    Skordis, Charalampos
    Liaskou, Christina
    Papagiakoumou, Evangelia
    Sotiropoulos, Spyridon
    Plavoukou, Theodora
    Karakasidou, Palina
    Georgoudis, George
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (08)
  • [46] Comparison of Inter-Rater Reliability Techniques in Performance-Based Assessment
    Mancar, Sinem Arslan
    Gulleroglu, H. Deniz
    INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION, 2022, 9 (02): : 515 - 533
  • [47] Inter-rater reliability of the Abbreviated Injury Scale scores in patients with severe head injury shows good inter-rater agreement but variability between countries. An inter-country comparison study
    Amy C. Gunning
    Menco J. S. Niemeyer
    Mark van Heijl
    Karlijn J. P. van Wessem
    Ronald V. Maier
    Zsolt J. Balogh
    Luke P. H. Leenen
    European Journal of Trauma and Emergency Surgery, 2023, 49 : 1183 - 1188
  • [48] Inter-rater reliability of the assessment of adverse drug reactions in the hospitalised elderly
    Tangiisuran, B.
    Auyeung, V.
    Cheek, L.
    Rajkumar, C.
    Davies, G.
    JOURNAL OF NUTRITION HEALTH & AGING, 2013, 17 (08) : 700 - 705
  • [49] Inter-rater reliability of the assessment of adverse drug reactions in the hospitalised elderly
    B. Tangiisuran
    V. Auyeung
    L. Cheek
    C. Rajkumar
    J. Graham Davies
    The journal of nutrition, health & aging, 2013, 17 : 700 - 705
  • [50] Inter-rater reliability of the Abbreviated Injury Scale scores in patients with severe head injury shows good inter-rater agreement but variability between countries. An inter-country comparison study
    Gunning, Amy C.
    Niemeyer, Menco J. S.
    van Heijl, Mark
    van Wessem, Karlijn J. P.
    Maier, Ronald, V
    Balogh, Zsolt J.
    Leenen, Luke P. H.
    EUROPEAN JOURNAL OF TRAUMA AND EMERGENCY SURGERY, 2023, 49 (03) : 1183 - 1188