MUSIQ: Multi-scale Image Quality Transformer

被引:416
作者
Ke, Junjie [1 ]
Wang, Qifei [1 ]
Wang, Yilin [2 ]
Milanfar, Peyman [1 ]
Yang, Feng [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Google, Mountain View, CA 94043 USA
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
关键词
D O I
10.1109/ICCV48922.2021.00510
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ [41], SPAQ [11], and KonIQ-10k [16].(1)
引用
收藏
页码:5128 / 5137
页数:10
相关论文
共 48 条
[11]  
Dosovitskiy Alexey, 2020, INT C LEARN REPR
[12]   Perceptual Quality Assessment of Smartphone Photography [J].
Fang, Yuming ;
Zhu, Hanwei ;
Zeng, Yan ;
Ma, Kede ;
Wang, Zhou .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3674-3683
[13]  
Gehring J, 2017, PR MACH LEARN RES, V70
[14]   Perceptual quality prediction on authentically distorted images using a bag of features approach [J].
Ghadiyaram, Deepti ;
Bovik, Alan C. .
JOURNAL OF VISION, 2017, 17 (01)
[15]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[16]   Effective Aesthetics Prediction with Multi-level Spatially Pooled Features [J].
Hosu, Vlad ;
Goldluecke, Bastian ;
Saupe, Dietmar .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9367-9375
[17]   KonIQ-10k: An Ecologically Valid Database for Deep Learning of Blind Image Quality Assessment [J].
Hosu, Vlad ;
Lin, Hanhe ;
Sziranyi, Tamas ;
Saupe, Dietmar .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) :4041-4056
[18]   Convolutional Neural Networks for No-Reference Image Quality Assessment [J].
Kang, Le ;
Ye, Peng ;
Li, Yi ;
Doermann, David .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1733-1740
[19]   Photo Aesthetics Ranking Network with Attributes and Content Adaptation [J].
Kong, Shu ;
Shen, Xiaohui ;
Lin, Zhe ;
Mech, Radomir ;
Fowlkes, Charless .
COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :662-679
[20]   Which Has Better Visual Quality: The Clear Blue Sky or a Blurry Animal? [J].
Li, Dingquan ;
Jiang, Tingting ;
Lin, Weisi ;
Jiang, Ming .
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (05) :1221-1234