Deep Learning-Based Noise Reduction Approach to Improve Speech Intelligibility for Cochlear Implant Recipients

被引:60
作者
Lai, Ying-Hui [1 ]
Tsao, Yu [2 ]
Lu, Xugang [3 ]
Chen, Fei [4 ]
Su, Yu-Ting [5 ]
Chen, Kuang-Chao [6 ,7 ]
Chen, Yu-Hsuan [8 ]
Chen, Li-Ching [7 ]
Li, Lieber Po-Hung [7 ,9 ]
Lee, Chin-Hui [10 ]
机构
[1] Natl Yang Ming Univ, Dept Biomed Engn, Taipei, Taiwan
[2] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei, Taiwan
[3] Natl Inst Informat & Commun Technol, Tokyo, Japan
[4] Southern Univ Sci & Technol, Dept Elect & Elect Engn, Shenzhen, Peoples R China
[5] Natl Taiwan Normal Univ, Dept Mechatron Engn, Taipei, Taiwan
[6] Far Eastern Mem Hosp, Dept Otolaryngol, New Taipei, Taiwan
[7] Cheng Hsin Gen Hosp, Dept Otolaryngol, 45 Cheng Hsin St, Taipei, Taiwan
[8] Cheng Hsin Gen Hosp, Dept Internal Med, Taipei, Taiwan
[9] Natl Yang Ming Univ, Sch Med, Fac Med, Taipei, Taiwan
[10] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
基金
中国国家自然科学基金;
关键词
Cochlear implant; Deep denoising autoencoder; Deep learning; Noise reduction; DENOISING AUTOENCODER; SUBSPACE APPROACH; NEURAL-NETWORKS; VOCODED SPEECH; DYNAMIC-RANGE; RECOGNITION; ENHANCEMENT; HEARING; PERFORMANCE; ALGORITHMS;
D O I
10.1097/AUD.0000000000000537
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Objective: We investigate the clinical effectiveness of a novel deep learning-based noise reduction (NR) approach under noisy conditions with challenging noise types at low signal to noise ratio (SNR) levels for Mandarin-speaking cochlear implant (CI) recipients. Design: The deep learning-based NR approach used in this study consists of two modules: noise classifier (NC) and deep denoising autoencoder (DDAE), thus termed (NC + DDAE). In a series of comprehensive experiments, we conduct qualitative and quantitative analyses on the NC module and the overall NC + DDAE approach. Moreover, we evaluate the speech recognition performance of the NC + DDAE NR and classical single-microphone NR approaches for Mandarin-speaking CI recipients under different noisy conditions. The testing set contains Mandarin sentences corrupted by two types of maskers, two-talker babble noise, and a construction jackhammer noise, at 0 and 5 dB SNR levels. Two conventional NR techniques and the proposed deep learning-based approach are used to process the noisy utterances. We qualitatively compare the NR approaches by the amplitude envelope and spectrogram plots of the processed utterances. Quantitative objective measures include (1) normalized covariance measure to test the intelligibility of the utterances processed by each of the NR approaches; and (2) speech recognition tests conducted by nine Mandarin-speaking CI recipients. These nine CI recipients use their own clinical speech processors during testing. Results: The experimental results of objective evaluation and listening test indicate that under challenging listening conditions, the proposed NC + DDAE NR approach yields higher intelligibility scores than the two compared classical NR techniques, under both matched and mismatched training-testing conditions. Conclusions: When compared to the two well-known conventional NR techniques under challenging listening condition, the proposed NC + DDAE NR approach has superior noise suppression capabilities and gives less distortion for the key speech envelope information, thus, improving speech recognition more effectively for Mandarin CI recipients. The results suggest that the proposed deep learning-based NR approach can potentially be integrated into existing CI signal processors to overcome the degradation of speech perception caused by noise.
引用
收藏
页码:795 / 809
页数:15
相关论文
共 106 条
[1]  
[Anonymous], 2016, ARXIV160202830
[2]  
[Anonymous], 1997, ANSI S3. 5-1997, V19, P90
[3]  
Bang S, 2017, ISSCC DIG TECH PAP I, P250, DOI 10.1109/ISSCC.2017.7870355
[4]   Learning Deep Architectures for AI [J].
Bengio, Yoshua .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127
[5]   Digital noise reduction: Outcomes from laboratory and field studies [J].
Bentler, Ruth ;
Wu, Yu-Hsiang ;
Kettel, Jerrica ;
Hurtig, Richard .
INTERNATIONAL JOURNAL OF AUDIOLOGY, 2008, 47 (08) :447-460
[6]  
Bong K, 2017, ISSCC DIG TECH PAP I, P248, DOI 10.1109/ISSCC.2017.7870354
[7]   Advanced Beamformers for Cochlear Implant Users: Acute Measurement of Speech Perception in Challenging Listening Conditions [J].
Buechner, Andreas ;
Dyballa, Karl-Heinz ;
Hehrmann, Phillipp ;
Fredelake, Stefan ;
Lenarz, Thomas .
PLOS ONE, 2014, 9 (04)
[8]   Results of a Pilot Study With a Signal Enhancement Algorithm for HiRes 120 Cochlear Implant Users [J].
Buechner, Andreas ;
Brendel, Martina ;
Saalfeld, Hilke ;
Litvak, Leonid ;
Frohne-Buechner, Carolin ;
Lenarz, Thomas .
OTOLOGY & NEUROTOLOGY, 2010, 31 (09) :1386-1390
[9]  
Chen F, 2014, 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), P551, DOI 10.1109/ISCSLP.2014.6936705
[10]  
Chen F, 2015, EAR HEARING, V36, P61, DOI 10.1097/AUD.0000000000000074