Embedding Learning for Preference-based Speech Quality Assessment

被引：0

作者：

Hu, Cheng-Hung ^{[1
]}

Yasuda, Yusuke ^{[1
]}

Toda, Tomoki ^{[1
]}

机构：

[1] Nagoya Univ, Nagoya, Aichi, Japan

来源：

INTERSPEECH 2024 | 2024年

关键词：

Speech Preference Assessment; Speech Quality Assessment; Pairwise Comparison; MOS;

D O I：

10.21437/Interspeech.2024-1243

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One goal of Speech Quality Assessment is to compare the quality of different utterances. Recently, several models based on preferences have been developed. These models typically use comparisons of MOS as preference scores during training. However, they often treat pairs of utterances with large differences in MOS and those with similar MOS equally, which increase the cost of accurate MOS prediction. To tackle this issue, this study suggests using embedding loss to bring pairs of utterance embeddings with similar MOS closer while separating those with dissimilar MOS. Our experiments showed that models trained with embedding loss perform better in both in-domain and out-domain scenarios. Furthermore, we use t-SNE visualization to analyze the distribution of embeddings extracted by models trained with and without embedding loss. Results indicate that embeddings of utterances with similar MOS scores are brought closer, whereas those with differing MOS scores are effectively separated.

引用

页码：2685 / 2689

页数：5

共 22 条

[1]

Baevski A, 2020, ADV NEUR IN, V33

[2]

BRADLEY RA, 1952, BIOMETRIKA, V39, P324, DOI 10.1093/biomet/39.3-4.324

[3]

Cooper E., 2023, arXiv preprint arXiv:2310.02640

[4]

Ester M, 1996, KDD 96, P226, DOI DOI 10.5555/3001460.3001507

[5]

Hu C.-H., 2023, PROC INTERSPEECH 202, p546 550

[6]

Huang W.-C., 2022, ICASSP 2022 2022 IEE

[7]

King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001

[8]

Kubichek R. F., 1993, P IEEE PACIFIC RIM C, V1

[9]

Leng Y., 2021, ICASSP 2021 2021 IEE

[10]

Lo C. -C., 2019, PROC INTERSPEECH 201

← 1 2 3 →