MUSIQ: Multi-scale Image Quality Transformer

被引:416
作者
Ke, Junjie [1 ]
Wang, Qifei [1 ]
Wang, Yilin [2 ]
Milanfar, Peyman [1 ]
Yang, Feng [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Google, Mountain View, CA 94043 USA
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
关键词
D O I
10.1109/ICCV48922.2021.00510
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ [41], SPAQ [11], and KonIQ-10k [16].(1)
引用
收藏
页码:5128 / 5137
页数:10
相关论文
共 48 条
[1]  
Adelson E., 1984, RCA Eng., V29, P33
[2]  
[Anonymous], 2020, P IEEE C COMP VIS PA
[3]   Attention Augmented Convolutional Networks [J].
Bello, Irwan ;
Zoph, Barret ;
Vaswani, Ashish ;
Shlens, Jonathon ;
Le, Quoc V. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294
[4]   Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment [J].
Bosse, Sebastian ;
Maniry, Dominique ;
Mueller, Klaus-Robert ;
Wiegand, Thomas ;
Samek, Wojciech .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (01) :206-219
[5]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[6]  
Chen Huigang, 2020, CAUSALML PYTHON PACK
[7]  
Chen Ming, 2020, P MACHINE LEARNING R, V119
[8]   Antireflective Transparent Conductive Oxide Film Based on a Tapered Porous Nanostructure [J].
Choi, Kiwoon ;
Jung, Jaehoon ;
Kim, Jongyoung ;
Lee, Joonho ;
Lee, Han Sup ;
Kang, Il-Suk .
MICROMACHINES, 2020, 11 (02)
[9]   Randaugment: Practical automated data augmentation with a reduced search space [J].
Cubuk, Ekin D. ;
Zoph, Barret ;
Shlens, Jonathon ;
Le, Quoc, V .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, :3008-3017
[10]  
Devlin J., 2018, arXiv:1810.04805