Enhancement of speech intelligibility under noisy reverberant conditions based on modulation spectrum concept

被引:0
作者
Van Ngo, Thuan [1 ]
Ho, Tuan Vu [1 ]
Unoki, Masashi [1 ]
Kubo, Rieko [2 ]
Akagi, Masato [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
[2] Natl Inst Informat & Commun Technol, Koganei, Tokyo, Japan
来源
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2020年
基金
日本科学技术振兴机构;
关键词
Speech intelligibility; modulation spectrum; modulation transfer function; smeared modulation spectrum; ROOM ACOUSTICS; PERCEPTION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study focuses on identifying effective features for controlling speech to increase speech intelligibility under adverse conditions. Previous methods either reduce noise and reverberation throughout speech presentation or enhance speech before presenting it by controlling its intensity and/or spectral properties to increase intelligibility. Among them, a method based on modulation transfer function theory, in which the environmental effects are inverted to anticipate attenuation of the modulation spectrum of speech, shows excellent potential due to its systematic and explicit derivation of intelligibility enhancement against environmental smears. However, directly obtaining that inversion requires estimating the modulation transfer function. The estimate seems complicated and tolerant under realistic variable conditions. This study takes a different approach: analyzing the relations of smeared modulation spectra by the environments for intelligibility to extract effective modifying features. First, we conduct listening tests for intelligibility in noise with different types of enhanced speech. Next, we extract acoustic and modulation frequency components in the smeared modulation spectra by noise showing high correlation with intelligibility scores. Finally, we examine the intelligibility benefits of modifying these components by performing listening tests. The results show that these components effectively increase intelligibility by at most 20%, which demonstrates that our concept is valid.
引用
收藏
页码:753 / 758
页数:6
相关论文
共 20 条
[1]  
[Anonymous], 1997, S351997 ANSI, V19, P90
[2]  
[Anonymous], 2020, SOUND SYSTEM EQUIPME
[3]  
Hermansky H, 1998, APPL NUM HARM ANAL, P395
[4]  
HOUTGAST T, 1973, ACUSTICA, V28, P66
[5]   A REVIEW OF THE MTF CONCEPT IN ROOM ACOUSTICS AND ITS USE FOR ESTIMATING SPEECH-INTELLIGIBILITY IN AUDITORIA [J].
HOUTGAST, T ;
STEENEKEN, HJM .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1985, 77 (03) :1069-1077
[6]   Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility in Noise [J].
Koutsogiannaki, Maria ;
Stylianou, Yannis .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :2508-2512
[7]   ATR JAPANESE SPEECH DATABASE AS A TOOL OF SPEECH RECOGNITION AND SYNTHESIS [J].
KUREMATSU, A ;
TAKEDA, K ;
SAGISAKA, Y ;
KATAGIRI, S ;
KUWABARA, H ;
SHIKANO, K .
SPEECH COMMUNICATION, 1990, 9 (04) :357-363
[8]   Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments [J].
Kusumoto, A ;
Arai, T ;
Kinoshita, K ;
Hodoshima, N ;
Vaughan, N .
SPEECH COMMUNICATION, 2005, 45 (02) :101-113
[9]  
Milic L., 2009, Multirate filtering for digital signal processing: MATLAB applications
[10]  
Rennies-Hochmuth J., HURRICANE CHALLENGE