Embedding Learning for Preference-based Speech Quality Assessment

被引:0
作者
Hu, Cheng-Hung [1 ]
Yasuda, Yusuke [1 ]
Toda, Tomoki [1 ]
机构
[1] Nagoya Univ, Nagoya, Aichi, Japan
来源
INTERSPEECH 2024 | 2024年
关键词
Speech Preference Assessment; Speech Quality Assessment; Pairwise Comparison; MOS;
D O I
10.21437/Interspeech.2024-1243
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One goal of Speech Quality Assessment is to compare the quality of different utterances. Recently, several models based on preferences have been developed. These models typically use comparisons of MOS as preference scores during training. However, they often treat pairs of utterances with large differences in MOS and those with similar MOS equally, which increase the cost of accurate MOS prediction. To tackle this issue, this study suggests using embedding loss to bring pairs of utterance embeddings with similar MOS closer while separating those with dissimilar MOS. Our experiments showed that models trained with embedding loss perform better in both in-domain and out-domain scenarios. Furthermore, we use t-SNE visualization to analyze the distribution of embeddings extracted by models trained with and without embedding loss. Results indicate that embeddings of utterances with similar MOS scores are brought closer, whereas those with differing MOS scores are effectively separated.
引用
收藏
页码:2685 / 2689
页数:5
相关论文
共 22 条
[1]  
Baevski A, 2020, ADV NEUR IN, V33
[2]  
BRADLEY RA, 1952, BIOMETRIKA, V39, P324, DOI 10.1093/biomet/39.3-4.324
[3]  
Cooper E., 2023, arXiv preprint arXiv:2310.02640
[4]  
Ester M, 1996, KDD 96, P226, DOI DOI 10.5555/3001460.3001507
[5]  
Hu C.-H., 2023, PROC INTERSPEECH 202, p546 550
[6]  
Huang W.-C., 2022, ICASSP 2022 2022 IEE
[7]  
King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001
[8]  
Kubichek R. F., 1993, P IEEE PACIFIC RIM C, V1
[9]  
Leng Y., 2021, ICASSP 2021 2021 IEE
[10]  
Lo C. -C., 2019, PROC INTERSPEECH 201