AN INVESTIGATION INTO USING PARALLEL DATA FOR FAR-FIELD SPEECH RECOGNITION

被引:0
作者
Qian, Yanmin [1 ,2 ]
Tan, Tian [1 ]
Yu, Dong [3 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
[2] Univ Cambridge, Dept Engn, Cambridge, England
[3] Microsoft Res, Redmond, WA USA
来源
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS | 2016年
关键词
Far-field speech recognition; Deep neural network; Multi-task learning; Feature denoising; Parallel data; DEEP NEURAL-NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Far-field speech recognition is an important yet challenging task due to low signal to noise ratio. In this paper, three novel deep neural network architectures are explored to improve the far-field speech recognition accuracy by exploiting the parallel far-field and close-talk recordings. All three novel architectures use multi-task learning for the model optimization but focus on three different ideas: dereverberation and recognition joint-learning, close-talk and far-field model knowledge sharing, and environment-code aware training. Experiments on the AMI single distant microphone (SDM) task show that each of the proposed method can boost accuracy individually, and additional improvement can be obtained with appropriate integration of these models. Overall we reduced the error rate by 10% relatively on the SDM set by exploiting the IHM data.
引用
收藏
页码:5725 / 5729
页数:5
相关论文
共 25 条
  • [11] Giri R, 2015, INT CONF ACOUST SPEE, P5014, DOI 10.1109/ICASSP.2015.7178925
  • [12] Transcribing Meetings With the AMIDA Systems
    Hain, Thomas
    Burget, Lukas
    Dines, John
    Garner, Philip N.
    Grezl, Frantisek
    El Hannani, Asmaa
    Huijbregts, Marijn
    Karafiat, Martin
    Lincoln, Mike
    Wan, Vincent
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 486 - 498
  • [13] Han K, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P2484
  • [14] Heigold G, 2013, INT CONF ACOUST SPEE, P8619, DOI 10.1109/ICASSP.2013.6639348
  • [15] Himawan I, 2015, INT CONF ACOUST SPEE, P4540, DOI 10.1109/ICASSP.2015.7178830
  • [16] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
  • [17] Huang HG, 2015, INT CONF ACOUST SPEE, P4610, DOI 10.1109/ICASSP.2015.7178844
  • [18] Huang Y, 2014, INTERSPEECH, P845
  • [19] Mimura M, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P2435
  • [20] Mimura M, 2015, INT CONF ACOUST SPEE, P4365, DOI 10.1109/ICASSP.2015.7178795