MUSIQ: Multi-scale Image Quality Transformer

被引：416

作者：

Ke, Junjie ^{[1
]}

Wang, Qifei ^{[1
]}

Wang, Yilin ^{[2
]}

Milanfar, Peyman ^{[1
]}

Yang, Feng ^{[1
]}

机构：

[1] Google Res, Mountain View, CA 94043 USA

[2] Google, Mountain View, CA 94043 USA

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

关键词：

D O I：

10.1109/ICCV48922.2021.00510

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ [41], SPAQ [11], and KonIQ-10k [16].(1)

引用

页码：5128 / 5137

页数：10

共 48 条

[1]

Adelson E., 1984, RCA Eng., V29, P33

[2]

[Anonymous], 2020, P IEEE C COMP VIS PA

[3] Attention Augmented Convolutional Networks [J].

Bello, Irwan ;

Zoph, Barret ;

Vaswani, Ashish ;

Shlens, Jonathon ;

Le, Quoc V. .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294

[4] Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment [J].

Bosse, Sebastian ;

Maniry, Dominique ;

Mueller, Klaus-Robert ;

Wiegand, Thomas ;

Samek, Wojciech .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (01) :206-219

[5] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[6]

Chen Huigang, 2020, CAUSALML PYTHON PACK

[7]

Chen Ming, 2020, P MACHINE LEARNING R, V119

[8] Antireflective Transparent Conductive Oxide Film Based on a Tapered Porous Nanostructure [J].

Choi, Kiwoon ;

Jung, Jaehoon ;

Kim, Jongyoung ;

Lee, Joonho ;

Lee, Han Sup ;

Kang, Il-Suk .

MICROMACHINES, 2020, 11 (02)

[9] Randaugment: Practical automated data augmentation with a reduced search space [J].

Cubuk, Ekin D. ;

Zoph, Barret ;

Shlens, Jonathon ;

Le, Quoc, V .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, :3008-3017

[10]

Devlin J., 2018, arXiv:1810.04805

← 1 2 3 4 5 →