Nonintrusive Perceptual Audio Quality Assessment for User-Generated Content Using Deep Learning

被引:6
|
作者
Mumtaz, Deebha [1 ]
Jakhetiya, Vinit [1 ]
Nathwani, Karan [2 ]
Subudhi, Badri Narayan [2 ]
Guntuku, Sharath Chandra [3 ]
机构
[1] Indian Inst Technol Jammu, Dept Comp Sci & Engn, Jammu 181221, India
[2] Indian Inst Technol Jammu, Dept Elect Engn, Jammu 181221, India
[3] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
关键词
Quality assessment; Streaming media; Measurement; Spectrogram; Speech recognition; Bit rate; Background noise; Audio quality assessment; deep learning; gated recurrent unit (GRU); non-intrusive quality metric; user-generated multimedia (UGM); SPEECH; GRU;
D O I
10.1109/TII.2021.3139010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the boom of social media communication, teleconferencing, and online classes, audiovisual communication over bandwidth strained networks has become an integral part of our lives. Consequently, the growing demand for the quality of experience necessitates developing algorithms to measure and enrich user experience. Prior studies have mainly focused on assessing speech quality and intelligibility with reference to audio quality assessment, while other categories in user-generated multimedia (UGM) are less explored. Moreover, frequency-domain properties of speech and UGM audio are significantly different from each other. Furthermore, there is a lack of a standard dataset for the quality assessment of UGM. Considering these limitations, in this article, we first develop the IIT-JMU-UGM audio dataset consisting of 1150 audio clips, with diverse context, content, and types of degradation commonly observed in real-world scenarios and annotated with the subjective quality scores. Finally, we propose a non-intrusive audio quality assessment metric using a stacked gated-recurrent-unit-based deep learning framework. The proposed model outperforms several baseline methods, including state-of-the-art non-intrusive and intrusive approaches. The resulting Pearson's correlation coefficient of 0.834 indicates that the proposed method efficiently mirrors human auditory perception.
引用
收藏
页码:7780 / 7789
页数:10
相关论文
共 50 条
  • [1] Automatic Organisation and Quality Analysis of User-Generated Content with Audio Fingerprinting
    Mordido, Goncalo
    Magalhaes, Joao
    Cavaco, Sofia
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 1814 - 1818
  • [2] TRANSFORMER-BASED QUALITY ASSESSMENT MODEL FOR GENERALIZED USER-GENERATED MULTIMEDIA AUDIO CONTENT
    Mumtaz, Deebha
    Jena, Ajit
    Jakhetiya, Vinit
    Nathwani, Karan
    Guntuku, Sharath C.
    INTERSPEECH 2022, 2022, : 674 - 678
  • [3] UGC-VIDEO: perceptual quality assessment of user-generated videos
    Li, Yang
    Meng, Shengbin
    Zhang, Xinfeng
    Wang, Shiqi
    Wang, Yue
    Ma, Siwei
    THIRD INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2020), 2020, : 35 - 38
  • [4] Quality Characteristics for User-Generated Content
    Musto J.
    Dahanayake A.
    Musto, Jiri (jiri.musto@lut.fi), 1600, IOS Press BV (343): : 244 - 263
  • [5] Subjective Quality Assessment of User-Generated Content Gaming Videos
    Yu, Xiangxu
    Tu, Zhengzhong
    Ying, Zhenqiang
    Bovik, Alan
    Birkbeck, Neil
    Wang, Yilin
    Adsumilli, Balu
    2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 74 - 83
  • [6] Assessing the Quality of User-Generated Content
    Stefan Winkler
    ZTE Communications, 2013, 11 (01) : 37 - 40
  • [7] Learning Spatiotemporal Interactions for User-Generated Video Quality Assessment
    Zhu, Hanwei
    Chen, Baoliang
    Zhu, Lingyu
    Wang, Shiqi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1031 - 1042
  • [8] Ensemble Deep Learning for Multilabel Binary Classification of User-Generated Content
    Haralabopoulos, Giannis
    Anagnostopoulos, Ioannis
    McAuley, Derek
    ALGORITHMS, 2020, 13 (04)
  • [9] On Evaluating Perceptual Quality of Online User-Generated Videos
    Jang, Soobeom
    Lee, Jong-Seok
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (09) : 1808 - 1818
  • [10] A Supervised Machine Learning Approach for the Credibility Assessment of User-Generated Content
    Jain, Praphula Kumar
    Pamula, Rajendra
    Ansari, Sarfraj
    WIRELESS PERSONAL COMMUNICATIONS, 2021, 118 (04) : 2469 - 2485