Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks

被引:4
作者
Sharma, Dushyant [1 ]
Hogg, Aidan O. T. [2 ]
Wang, Yu [3 ]
Nour-Eldin, Amr [1 ]
Naylor, Patrick A. [2 ]
机构
[1] Nuance Commun, Burlington, MA 01803 USA
[2] Imperial Coll London, Dept Elect & Elect Engn, London, England
[3] Univ Cambridge, Dept Engn, Cambridge, England
来源
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2019年
关键词
speech quality estimation; POLQA estimation; deep neural networks; INTELLIGIBILITY; CHANNELS; STANDARD;
D O I
10.23919/eusipco.2019.8902646
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Estimating the quality of speech without the use of a clean reference signal is a challenging problem, in part due to the time and expense required to collect sufficient training data for modern machine learning algorithms. We present a novel, non-intrusive estimator that exploits recurrent neural network architectures to predict the intrusive POLQA score of a speech signal in a short time context. The predictor is based on a novel compressed representation of modulation domain features, used in conjunction with static MFCC features. We show that the proposed method can reliably predict POLQA with a 300 ms context, achieving a mean absolute error of 0.21 on unseen data. The proposed method is trained using English speech and is shown to generalize well across unseen languages. The neural network also jointly estimates the mean voice activity detection (VAD) with an F1 accuracy score of 0.9, removing the need for an external VAD.
引用
收藏
页数:5
相关论文
共 50 条
[21]   Multi-objective non-intrusive hearing-aid speech assessment model [J].
Chiang, Hsin-Tien ;
Fu, Szu-Wei ;
Wang, Hsin-Min ;
Tsao, Yu ;
Hansen, John H. L. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2024, 156 (05) :3574-3587
[22]   SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS [J].
Graves, Alex ;
Mohamed, Abdel-rahman ;
Hinton, Geoffrey .
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, :6645-6649
[23]   AN END-TO-END NON-INTRUSIVE MODEL FOR SUBJECTIVE AND OBJECTIVE REAL-WORLD SPEECH ASSESSMENT USING A MULTI-TASK FRAMEWORK [J].
Zhang, Zhuohuang ;
Vyas, Piyush ;
Dong, Xuan ;
Williamson, Donald S. .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :316-320
[24]   Using recurrent neural networks to improve the perception of speech in non-stationary noise by people with cochlear implants [J].
Goehring, Tobias ;
Keshavarzi, Mahmoud ;
Carlyon, Robert P. ;
Moore, Brian C. J. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 146 (01) :705-718
[25]   Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks [J].
Gu, Yu ;
Ling, Zhen-Hua ;
Dai, Li-Rong .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :297-301
[26]   Ideal ratio mask estimation using deep neural networks for monaural speech segregation in noisy reverberant conditions [J].
Li, Xu ;
Li, Junfeng ;
Yan, Yonghong .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :1203-1207
[27]   On Learning Spectral Masking for Single Channel Speech Enhancement Using Feedforward and Recurrent Neural Networks [J].
Saleem, Nasir ;
Khattak, Muhammad Irfan ;
Al-Hasan, Muath ;
Qazi, Abdul Baseer .
IEEE ACCESS, 2020, 8 :160581-160595
[28]   LOMBARD SPEECH SYNTHESIS USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS [J].
Bollepalli, Bajibabu ;
Airaksinen, Manu ;
Alku, Paavo .
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, :5505-5509
[29]   Deep Elman recurrent neural networks for statistical parametric speech synthesis [J].
Achanta, Sivanand ;
Gangashetty, Suryakanth V. .
SPEECH COMMUNICATION, 2017, 93 :31-42
[30]   Synthetic Speech Detection Using Neural Networks [J].
Reimao, Ricardo ;
Tzerpos, Vassilios .
2021 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2021, :97-102