Performance Analysis of Deep Learning Based Speech Quality Model with Mixture of Features

被引:0
作者
Jaiswal, Rahul [1 ]
机构
[1] Univ Agder, Dept Informat & Commun Technol, Grimstad, Norway
来源
2022 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM) | 2022年
关键词
DNN; Speech feature; Speech quality; QoE; VAD; STANDARD;
D O I
10.1109/ISM55400.2022.00053
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech is one of the convenient mediums of communication among humans. However, the quality of speech deteriorates due to the surrounding noise. To fulfil the expected level of quality of experience (QoE) of the end-user while exploiting distinct applications, such as Microsoft Skype, Apple FaceTime to name a few, it is important to measure the speech quality and monitor it in the real-time. To this end, this paper investigates a series of deep neural network (DNN)-based objective no-reference speech quality models (SQMs) in accurately measuring speech quality. Three speech features, namely, line spectral frequencies (LSF), mel-frequency cepstral coefficients (MFCC), and multi-resolution auditory model (MRAM) are extracted from the speech signal after processing it through a voice activity detector (VAD). A series of DNN-based SQMs is, then, developed by incorporating either a single or a mixture of speech features. The standard no-reference speech quality prediction model (P.563) is employed as a baseline model. Results demonstrate that the DNN-based SQM trained with MRAM feature performs better in accurately measuring speech quality as compared to the baseline model and other DNN-based SQMs trained with different speech features or their mixtures.
引用
收藏
页码:240 / 244
页数:5
相关论文
共 20 条
  • [1] Ajibola Alim S., 2018, IntechOpen
  • [2] Bruhn S., 2012, US Patent, Patent No. [8,195,449, 8195449]
  • [3] Brunnstrom K., 2013, White Paper
  • [4] Dubey RK, 2015, 2015 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICSC), P261, DOI 10.1109/ICSPCom.2015.7150659
  • [5] Falk TH, 2005, INT CONF ACOUST SPEE, P125
  • [6] Single-ended speech quality measurement using machine learning methods
    Falk, Tiago H.
    Chan, Wai-Yip
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (06): : 1935 - 1947
  • [7] Hines A, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P438
  • [8] HIRSCH HG, 2000, P ISCA ITRW ASR2000, P18
  • [9] Hu Y, 2006, INT CONF ACOUST SPEE, P153
  • [10] Jaiswal Rahul, 2022, Proceedings of the 11th International Conference on Robotics, Vision, Signal Processing and Power Applications: Enhancing Research and Innovation through the Fourth Industrial Revolution. Lecture Notes in Electrical Engineering (829), P59, DOI 10.1007/978-981-16-8129-5_10