Hypo and Hyperarticulated Speech Data Augmentation for Spontaneous Speech Recognition

被引：0

作者：

Lee, Sung Joo ^{[1
]}

Kang, Byung-Ok ^{[1
]}

Chung, Hoon ^{[1
]}

Park, Jeon Gue ^{[1
]}

Lee, Yun Keun ^{[1
]}

机构：

[1] Elect & Telecommun Res Inst, Daejeon, South Korea

来源：

2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2018年

关键词：

Speech recognition; data augmentation; hypo and hyperarticulation; speech sythesis;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Among many challenges in spontaneous speech recognition, we focus on the variability of speech depending on the degree of articulation such as hypo and hyperarticulation. In this paper, we investigate the feasibility of the past acoustic-phonetic studies on the variability of speech in terms of the data augmentation of a spontaneous speech recognition system. To do so, we develop data augmentation approaches to reflect the acoustic-phonetic characteristics of hypo and hyper-articulated speech. Since our approaches are based on signal processing methods they do not require a model learned from supervised or unsupervised data. A series of speech recognition tests are conducted across various speech styles. The results show that we are able to achieve meaningful performance gain by using our approaches. It also indicates that the past acoustic-phonetic knowledge of the variability of speech is useful for improving the recognition performance of spontaneous speech including hypo and hyper-articulated speech.

引用

页码：2080 / 2084

页数：5

共 22 条

[1]

[Anonymous], 7 ISCA WORKSH SPEECH

[2]

[Anonymous], 2011, PROC 2011 WORKSHOP A

[3] Automatic speech recognition and speech variability: A review [J].

Benzeghiba, M. ;

De Mori, R. ;

Deroo, O. ;

Dupont, S. ;

Erbes, T. ;

Jouvet, D. ;

Fissore, L. ;

Laface, P. ;

Mertins, A. ;

Ris, C. ;

Rose, R. ;

Tyagi, V. ;

Wellekens, C. .

SPEECH COMMUNICATION, 2007, 49 (10-11) :763-786

[4] Automatic recognition of spontaneous speech for access to multilingual oral history archives [J].

Byrne, W ;

Doermann, D ;

Franz, MT ;

Gustman, S ;

Hajic, J ;

Oard, D ;

Picheny, M ;

Psutka, J ;

Ramabhadran, B ;

Soergel, D ;

Ward, T ;

Zhu, WJ .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04) :420-435

[5]

Fant G., 1960, ACOUSTIC THEORY SPEE

[6] Introduction to the special issue on spontaneous speech processing [J].

Furui, S ;

Kawahara, T ;

Beckman, M ;

Nakamura, S ;

Hirschberg, JB ;

Narayanan, S ;

Itahashi, S .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04) :349-350

[7]

Furui S, 2003, PATTERN RECOGNITION IN SPEECH AND LANGUAGE PROCESSING, P191

[8] Selected topics from 40 years of research on speech and speaker recognition [J].

Furui, Sadaoki .

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, :1-8

[9]

Furui Sadaoki., 2003, P IEEE ISCA WORKSHOP, P1

[10] Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].

Hinton, Geoffrey ;

Deng, Li ;

Yu, Dong ;

Dahl, George E. ;

Mohamed, Abdel-rahman ;

Jaitly, Navdeep ;

Senior, Andrew ;

Vanhoucke, Vincent ;

Patrick Nguyen ;

Sainath, Tara N. ;

Kingsbury, Brian .

IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97

← 1 2 3 →