Statistical quality estimation for partially subjective classification tasks through crowdsourcing

被引：0

作者：

Yoshinao Sato

Kouki Miyazawa

机构：

[1] Fairy Devices Inc.,

来源：

Language Resources and Evaluation | 2023年 / 57卷

关键词：

Crowdsourcing; Quality estimation; Latent variable model; Partially subjective task;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

When constructing a large-scale data resource, the quality of artifacts has great significance, especially when they are generated by creators through crowdsourcing. A widely used approach is to estimate the quality of each artifact based on evaluations by reviewers. However, the commonly used vote-counting method to aggregate reviewers’ evaluations does not work effectively for partially subjective tasks. In such a task, a single correct answer cannot necessarily be defined. We propose a statistical quality estimation method for partially subjective classification tasks to infer the quality of artifacts considering the abilities and biases of creators and reviewers as latent variables. In our experiments, we use the partially subjective task of classifying speech into one of the following four attitudes: agreement, disagreement, stalling, and question. We collect a speech corpus through crowdsourcing and apply the proposed method to it. The results show that the proposed method estimates the quality of speech more effectively than vote aggregation, as measured by correlation with a fine-grained classification performed by experts. Furthermore, we compare the speech attitude classification performance of a neural network model on two subsets of our corpus extracted using the voting and proposed methods. The results indicate that we can effectively extract a consistent and high-quality subset of a corpus using the proposed method. This method facilitates the efficient collection of large-scale data resources for mutually exclusive classification, even if the task is partially subjective.

引用

页码：31 / 56

页数：25

共 46 条

[31] On subjective quality assessment of adaptive video streaming via crowdsourcing and laboratory based experiments [J].

Sogaard, Jacob ;

Shahid, Muhammad ;

Pokhrel, Jeevan ;

Brunnstrom, Kjell .

MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (15) :16727-16748

[32] Your Cursor Reveals: On Analyzing Workers' Browsing Behavior and Annotation Quality in Crowdsourcing Tasks [J].

Lo, Pei-Chi ;

Lim, Ee-Peng .

IEEE ACCESS, 2025, 13 :124792-124806

[33] CrowdHeritage: Improving the quality of Cultural Heritage through crowdsourcing methods [J].

Ralli, Maria ;

Bekiaris, Spyros ;

Kaldeli, Eirini ;

Menis-Mastromichalakis, Orfeas ;

Sofou, Natasa ;

Tzouvaras, Vassilis ;

Stamou, Giorgos .

2020 15TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION AND PERSONALIZATION (SMAP 2020), 2020, :168-173

[34] Intra- and Inter-rater Agreement in a Subjective Speech Quality Assessment Task in Crowdsourcing [J].

Jimenez, Rafael Zequeira ;

Llagostera, Anna ;

Naderi, Babak ;

Moeller, Sebastian ;

Berger, Jens .

COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ), 2019, :1138-1143

[35] Large-Scale Crowdsourcing Subjective Quality Evaluation of Learning-Based Image Coding [J].

Upenik, Evgeniy ;

Testolina, Michela ;

Ascenso, Joao ;

Pereira, Fernando ;

Ebrahimi, Touradj .

2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,

[36] Dynamic Estimation of Rater Reliability in Subjective Tasks Using Multi-Armed Bandits [J].

Tarasov, Alexey ;

Delany, Sarah Jane ;

Mac Namee, Brian .

Proceedings of 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust and 2012 ASE/IEEE International Conference on Social Computing (SocialCom/PASSAT 2012), 2012, :979-980

[37] Improving accuracy and lowering cost in crowdsourcing through an unsupervised expertise estimation approach [J].