End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks

被引:9
作者
Pedersen, Mathias B. [1 ]
Kolbaek, Morten [1 ]
Andersen, Asger H. [2 ]
Jensen, Soren H. [1 ]
Jensen, Jesper [1 ,2 ]
机构
[1] Aalborg Univ, Dept Elect Syst, Aalborg, Denmark
[2] Oticon AS, Smorum, Denmark
来源
INTERSPEECH 2020 | 2020年
关键词
speech intelligibility prediction; fully convolutional neural networks; deep learning; ENHANCEMENT; NOISE; ALGORITHM; QUALITY;
D O I
10.21437/Interspeech.2020-1740
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Data-driven speech intelligibility prediction has been slow to take off. Datasets of measured speech intelligibility are scarce, and so current models are relatively small and rely on hand-picked features. Classical predictors based on psychoacoustic models and heuristics are still the state-of-the-art. This work proposes a U-Net inspired fully convolutional neural network architecture, NSIP, trained and tested on ten datasets to predict intelligibility of time-domain speech. The architecture is compared to a frequency domain data-driven predictor and to the classical state-of-the-art predictors STOI, ESTOI, HASPI and SIIB. The performance of NSIP is found to be superior for datasets seen in the training phase. On unseen datasets NSIP reaches performance comparable to classical predictors.
引用
收藏
页码:1151 / 1155
页数:5
相关论文
共 33 条
[1]   Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks [J].
Andersen, Asger Heidemann ;
de Haan, Jan Mark ;
Tan, Zheng-Hua ;
Jensen, Jesper .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) :1925-1939
[2]   Predicting the Intelligibility of Noisy and Nonlinearly Processed Binaural Speech [J].
Andersen, Asger Heidemann ;
de Haan, Jan Mark ;
Tan, Zheng-Hua ;
Jensen, Jesper .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) :1908-1920
[3]   The impact of exploiting spectro-temporal context in computational speech segregation [J].
Bentsen, Thomas ;
Kressner, Abigail A. ;
Dau, Torsten ;
May, Tobias .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (01) :248-259
[4]   Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices [J].
Falk, Tiago H. ;
Parsa, Vijay ;
Santos, Joao F. ;
Arehart, Kathryn ;
Hazrati, Oldooz ;
Huber, Rainer ;
Kates, James M. ;
Scollie, Susan .
IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (02) :114-124
[5]   End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks [J].
Fu, Szu-Wei ;
Wang, Tao-Wei ;
Tsao, Yu ;
Lu, Xugang ;
Kawai, Hisashi .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) :1570-1584
[6]   A classification based approach to speech segregation [J].
Han, Kun ;
Wang, DeLiang .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (05) :3475-3483
[7]   An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type [J].
Healy, Eric W. ;
Yoho, Sarah E. ;
Chen, Jitong ;
Wang, Yuxuan ;
Wang, DeLiang .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 138 (03) :1660-1669
[8]   Optimal Near-End Speech Intelligibility Improvement Incorporating Additive Noise and Late Reverberation Under an Approximation of the Short-Time SII [J].
Hendriks, Richard C. ;
Crespo, Joao B. ;
Jensen, Jesper ;
Taal, Cees H. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (05) :851-862
[9]   An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers [J].
Jensen, Jesper ;
Taal, Cees H. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) :2009-2022
[10]  
Karbasi M, 2016, INT CONF ACOUST SPEE, P624, DOI 10.1109/ICASSP.2016.7471750