Accent Recognition with Hybrid Phonetic Features

被引:10
作者
Zhang, Zhan [1 ]
Wang, Yuehai [1 ]
Yang, Jianyi [1 ]
机构
[1] Zhejiang Univ, Dept Informat & Elect Engn, Hangzhou 310007, Peoples R China
关键词
accent recognition; audio classification; accented English speech recognition; IDENTIFICATION;
D O I
10.3390/s21186258
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The performance of voice-controlled systems is usually influenced by accented speech. To make these systems more robust, frontend accent recognition (AR) technologies have received increased attention in recent years. As accent is a high-level abstract feature that has a profound relationship with language knowledge, AR is more challenging than other language-agnostic audio classification tasks. In this paper, we use an auxiliary automatic speech recognition (ASR) task to extract language-related phonetic features. Furthermore, we propose a hybrid structure that incorporates the embeddings of both a fixed acoustic model and a trainable acoustic model, making the language-related acoustic feature more robust. We conduct several experiments on the AESRC dataset. The results demonstrate that our approach can obtain an 8.02% relative improvement compared with the Transformer baseline, showing the merits of the proposed method.
引用
收藏
页数:14
相关论文
共 30 条
[1]  
[Anonymous], 2018, PROC CVPR IEEE, DOI DOI 10.1109/TPAMI.2019.2913372
[2]  
Behravan Hamid, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P5332, DOI 10.1109/ICASSP.2014.6854621
[3]   i-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition [J].
Behravan, Hamid ;
Hautamaki, Ville ;
Siniscalchi, Sabato Marco ;
Kinnunen, Tomi ;
Lee, Chin-Hui .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (01) :29-41
[4]  
Cai W., 2018, P OD SPEAK LANG REC, P74, DOI DOI 10.21437/ODYSSEY.2018-11
[5]  
Chen JK, 2018, 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), P319, DOI 10.1109/ISCSLP.2018.8706687
[6]  
Chu X., 2021, ACCENTED SPEECH RECO
[7]  
Crawshaw M, ARXIV 2020 ARXIV2009
[8]  
Dekel O, 2005, LECT NOTES COMPUT SC, V3361, P146
[9]  
Graves A., 2006, P 23 INT C MACH LEAR, P369
[10]   Dual-domain Hierarchical Classification of Phonetic Time Series [J].
Hamooni, Hossein ;
Mueen, Abdullah .
2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, :160-169