Accent Recognition with Hybrid Phonetic Features

被引：10

作者：

Zhang, Zhan ^{[1
]}

Wang, Yuehai ^{[1
]}

Yang, Jianyi ^{[1
]}

机构：

[1] Zhejiang Univ, Dept Informat & Elect Engn, Hangzhou 310007, Peoples R China

来源：

SENSORS | 2021年 / 21卷 / 18期

关键词：

accent recognition; audio classification; accented English speech recognition; IDENTIFICATION;

D O I：

10.3390/s21186258

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

The performance of voice-controlled systems is usually influenced by accented speech. To make these systems more robust, frontend accent recognition (AR) technologies have received increased attention in recent years. As accent is a high-level abstract feature that has a profound relationship with language knowledge, AR is more challenging than other language-agnostic audio classification tasks. In this paper, we use an auxiliary automatic speech recognition (ASR) task to extract language-related phonetic features. Furthermore, we propose a hybrid structure that incorporates the embeddings of both a fixed acoustic model and a trainable acoustic model, making the language-related acoustic feature more robust. We conduct several experiments on the AESRC dataset. The results demonstrate that our approach can obtain an 8.02% relative improvement compared with the Transformer baseline, showing the merits of the proposed method.

引用

页数：14

共 30 条

[1]

[Anonymous], 2018, PROC CVPR IEEE, DOI DOI 10.1109/TPAMI.2019.2913372

[2]

Behravan Hamid, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P5332, DOI 10.1109/ICASSP.2014.6854621

[3] i-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition [J].

Behravan, Hamid ;

Hautamaki, Ville ;

Siniscalchi, Sabato Marco ;

Kinnunen, Tomi ;

Lee, Chin-Hui .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (01) :29-41

[4]

Cai W., 2018, P OD SPEAK LANG REC, P74, DOI DOI 10.21437/ODYSSEY.2018-11

[5]

Chen JK, 2018, 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), P319, DOI 10.1109/ISCSLP.2018.8706687

[6]

Chu X., 2021, ACCENTED SPEECH RECO

[7]

Crawshaw M, ARXIV 2020 ARXIV2009

[8]

Dekel O, 2005, LECT NOTES COMPUT SC, V3361, P146

[9]

Graves A., 2006, P 23 INT C MACH LEAR, P369

[10] Dual-domain Hierarchical Classification of Phonetic Time Series [J].

Hamooni, Hossein ;

Mueen, Abdullah .

2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, :160-169

← 1 2 3 →