Study of Subjective and Objective Quality Assessment of Audio-Visual Signals

被引:232
作者
Min, Xiongkuo [1 ]
Zhai, Guangtao [1 ]
Zhou, Jiantao [2 ,3 ]
Farias, Mylene C. Q. [4 ]
Bovik, Alan Conrad [5 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Commun & Network Engn, Shanghai 200240, Peoples R China
[2] Univ Macau, State Key Lab Internet Things Smart City, Macau, Peoples R China
[3] Univ Macau, Dept Comp & Informat Sci, Macau, Peoples R China
[4] Univ Brasilia, Dept Elect Engn, BR-70853060 Brasilia, DF, Brazil
[5] Univ Texas Austin, Dept Elect & Comp Engn, Austin, TX 78712 USA
基金
中国国家自然科学基金;
关键词
Streaming media; Distortion; Databases; Quality assessment; Quality of experience; Video recording; Predictive models; audio-visual quality; video quality; audio quality; multimodal fusion; AUDIO QUALITY; VIDEO QUALITY; STRUCTURAL SIMILARITY; PERCEPTUAL IMAGE; SPEECH;
D O I
10.1109/TIP.2020.2988148
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The topics of visual and audio quality assessment (QA) have been widely researched for decades, yet nearly all of this prior work has focused only on single-mode visual or audio signals. However, visual signals rarely are presented without accompanying audio, including heavy-bandwidth video streaming applications. Moreover, the distortions that may separately (or conjointly) afflict the visual and audio signals collectively shape user-perceived quality of experience (QoE). This motivated us to conduct a subjective study of audio and video (A/V) quality, which we then used to compare and develop A/V quality measurement models and algorithms. The new LIVE-SJTU Audio and Video Quality Assessment (A/V-QA) Database includes 336 A/V sequences that were generated from 14 original source contents by applying 24 different A/V distortion combinations on them. We then conducted a subjective A/V quality perception study on the database towards attaining a better understanding of how humans perceive the overall combined quality of A/V signals. We also designed four different families of objective A/V quality prediction models, using a multimodal fusion strategy. The different types of A/V quality models differ in both the unimodal audio and video quality prediction models comprising the direct signal measurements and in the way that the two perceptual signal modes are combined. The objective models are built using both existing state-of-the-art audio and video quality prediction models and some new prediction models, as well as quality-predictive features delivered by a deep neural network. The methods of fusing audio and video quality predictions that are considered include simple product combinations as well as learned mappings. Using the new subjective A/V database as a tool, we validated and tested all of the objective A/V quality prediction models. We will make the database publicly available to facilitate further research.
引用
收藏
页码:6054 / 6068
页数:15
相关论文
共 53 条
[1]   Audio-Visual Multimedia Quality Assessment A Comprehensive Survey [J].
Akhtar, Zahid ;
Falk, Tiago H. .
IEEE ACCESS, 2017, 5 :21090-21117
[2]  
[Anonymous], 2012, ITUR Recommendation BT. 500-13
[3]  
[Anonymous], 2018, Global Internet Phenomena Preview: File sharing on the internet reverses a downward trend
[4]  
[Anonymous], P VQEG M OTT ON CAN
[5]  
[Anonymous], DOC ITU T REC P 23
[6]  
[Anonymous], 2013, 14th International Society for Music Information Retrieval Conference (ISMIR-2013)
[7]  
[Anonymous], PERC OBJ LIST QUAL A
[8]  
Attias H, 1997, ADV NEUR IN, V9, P27
[9]   Recurrent and Dynamic Models for Predicting Streaming Video Quality of Experience [J].
Bampis, Christos G. ;
Li, Zhi ;
Katsavounidis, Ioannis ;
Bovik, Alan C. .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (07) :3316-3331
[10]   SpEED-QA: Spatial Efficient Entropic Differencing for Image and Video Quality [J].
Bampis, Christos G. ;
Gupta, Praful ;
Soundararajan, Rajiv ;
Bovik, Alan C. .
IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (09) :1333-1337