MUSIQ: Multi-scale Image Quality Transformer

被引：437

作者：

Ke, Junjie ^{[1
]}

Wang, Qifei ^{[1
]}

Wang, Yilin ^{[2
]}

Milanfar, Peyman ^{[1
]}

Yang, Feng ^{[1
]}

机构：

[1] Google Res, Mountain View, CA 94043 USA

[2] Google, Mountain View, CA 94043 USA

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

关键词：

D O I：

10.1109/ICCV48922.2021.00510

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ [41], SPAQ [11], and KonIQ-10k [16].(1)

引用

页码：5128 / 5137

页数：10

共 48 条

[11]

Dosovitskiy Alexey, 2020, INT C LEARN REPR

[12] Perceptual Quality Assessment of Smartphone Photography [J].

Fang, Yuming ;

Zhu, Hanwei ;

Zeng, Yan ;

Ma, Kede ;

Wang, Zhou .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3674-3683

[13]

Gehring J, 2017, PR MACH LEARN RES, V70

[14] Perceptual quality prediction on authentically distorted images using a bag of features approach [J].

Ghadiyaram, Deepti ;

Bovik, Alan C. .

JOURNAL OF VISION, 2017, 17 (01)

[15] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[16] Effective Aesthetics Prediction with Multi-level Spatially Pooled Features [J].

Hosu, Vlad ;

Goldluecke, Bastian ;

Saupe, Dietmar .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9367-9375

[17] KonIQ-10k: An Ecologically Valid Database for Deep Learning of Blind Image Quality Assessment [J].

Hosu, Vlad ;

Lin, Hanhe ;

Sziranyi, Tamas ;

Saupe, Dietmar .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) :4041-4056

[18] Convolutional Neural Networks for No-Reference Image Quality Assessment [J].

Kang, Le ;

Ye, Peng ;

Li, Yi ;

Doermann, David .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1733-1740

[19] Photo Aesthetics Ranking Network with Attributes and Content Adaptation [J].

Kong, Shu ;

Shen, Xiaohui ;

Lin, Zhe ;

Mech, Radomir ;

Fowlkes, Charless .

COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :662-679

[20] Which Has Better Visual Quality: The Clear Blue Sky or a Blurry Animal? [J].

Li, Dingquan ;

Jiang, Tingting ;

Lin, Weisi ;

Jiang, Ming .

IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (05) :1221-1234

← 1 2 3 4 5 →