Combining standard and throat microphones for robust speech recognition

被引:67
作者
Graciarena, M
Franco, H
Sonmez, K
Bratt, H
机构
[1] SRI Int, Speech Technol & Res Lab, Menlo Pk, CA 94025 USA
[2] Univ Buenos Aires, Sch Engn, Inst Biomed Engn, RA-1053 Buenos Aires, DF, Argentina
关键词
noise robustness; probabilistic optimum filtering; speech recognition; throat microphone;
D O I
10.1109/LSP.2003.808549
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present a method to combine the standard and throat microphone signals for robust speech recognition in noisy environments. Our approach is to use the. probabilistic optimum filter (POF) mapping algorithm to estimate the standard microphone clean-speech feature vectors, used by standard speech recognizers, from both microphones' noisy-speech feature vectors. A small untranscribed "stereo" database (noisy and clean simultaneous recordings) is required to train the POF mappings. In continuous-speech recognition experiments using SRI International's DECIPHER recognition system, both using artificially added noise and using recorded noisy speech, the combined-microphone approach significantly outperforms the single-microphone approach.
引用
收藏
页码:72 / 74
页数:3
相关论文
共 50 条
  • [1] Feature Vector Normalization with Combined Standard and Throat Microphones for Robust ASR
    Buera, Luis
    Miguel, Antonio
    Saz, Oscar
    Ortega, Alfonso
    Lleida, Eduardo
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1289 - 1292
  • [2] Robust Speaker Recognition with Combined Use of Acoustic and Throat Microphone Speech
    Sahidullah, Md
    Hautamaki, Rosa Gonzalez
    Thomsen, Dennis Alexander Lehmann
    Kinntinenl, Tomi
    Tang, Zheng-Hua
    Hautamaki, Ville
    Parts, Robert
    Pitkanen, Martti
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1720 - 1724
  • [3] Robust Speech Recognition Combining Cepstral and Articulatory Features
    Zha, Zhuan-ling
    Hu, Jin
    Zhan, Qing-ran
    Shan, Ya-hui
    Xie, Xiang
    Wang, Jing
    Cheng, Hao-bo
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 1401 - 1405
  • [4] Voice Analysis Using Acoustic and Throat Microphones for Speech Therapy
    Mathew, Lani Rachel
    Gopakumar, K.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 173 - 174
  • [5] Knowledge Distillation for Throat Microphone Speech Recognition
    Suzuki, Takahito
    Ogata, Jun
    Tsunakawa, Takashi
    Nishida, Masafumi
    Nishimura, Masafumi
    INTERSPEECH 2019, 2019, : 461 - 465
  • [6] Combining the modified CTRANC and posterior union model for robust distant speech recognition
    Lin, Jie
    Li, Jianping
    Ming, Ji
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE INFORMATION COMPUTING AND AUTOMATION, VOLS 1-3, 2008, : 1068 - +
  • [7] Combining acoustic and articulatory feature information for robust speech recognition
    Kirchhoff, K
    Fink, GA
    Sagerer, G
    SPEECH COMMUNICATION, 2002, 37 (3-4) : 303 - 319
  • [8] Learning to Rank Microphones for Distant Speech Recognition
    Cornell, Samuele
    Brutti, Alessio
    Matassoni, Marco
    Squartini, Stefano
    INTERSPEECH 2021, 2021, : 3855 - 3859
  • [9] Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones
    Sahidullah, Md.
    Thomsen, Dennis Alexander Lehmann
    Hautamaki, Rosa Gonzalez
    Kinnunen, Tomi
    Tan, Zheng-Hua
    Parts, Robert
    Pitkanen, Martti
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) : 44 - 56
  • [10] FUSION OF STANDARD AND ALTERNATIVE ACOUSTIC SENSORS FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Heracleous, Panikos
    Even, Jani
    Ishi, Carlos T.
    Miyashita, Takahiro
    Hagita, Norihiro
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4837 - 4840