Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

被引：57

作者：

Lao, Shanshan ^{[1
]}

Gong, Yuan ^{[1
]}

Shi, Shuwei ^{[1
]}

Yang, Sidi ^{[1
]}

Wu, Tianhe ^{[1
]}

Wang, Jiahao ^{[1
]}

Xia, Weihao ^{[2
]}

Yang, Yujiu ^{[1
]}

机构：

[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China

[2] UCL, London, England

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPRW56347.2022.00123

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Image quality assessment (IQA) algorithm aims to quantify the human perception of image quality. Unfortunately, there is a performance drop when assessing the distortion images generated by generative adversarial network (GAN) with seemingly realistic textures. In this work, we conjecture that this maladaptation lies in the backbone of IQA models, where patch-level prediction methods use independent image patches as input to calculate their scores separately, but lack spatial relationship modeling among image patches. Therefore, we propose an Attention-based Hybrid Image Quality Assessment Network (AHIQ) to deal with the challenge and get better performance on the GAN-based IQA task. Firstly, we adopt a two-branch architecture, including a vision transformer (ViT) branch and a convolutional neural network (CNN) branch for feature extraction. The hybrid architecture combines interaction information among image patches captured by ViT and local texture details from CNN. To make the features from the shallow CNN more focused on the visually salient region, a deformable convolution is applied with the help of semantic information from the ViT branch. Finally, we use a patch-wise score prediction module to obtain the final score. The experiments show that our model outperforms the state-of-the-art methods on four standard IQA datasets and AHIQ ranked first on the Full Reference (FR) track of the NTIRE 2022 Perceptual Image Quality Assessment Challenge. Code and pretrained models are publicly available at https://github.com/IIGROUP/AHIQ [GRAPHICS] .

引用

页码：1139 / 1148

页数：10

共 57 条

[1]

[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00583

[2]

[Anonymous], 2017, CVPR, DOI DOI 10.1109/ICCV.2017.320

[3]

[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00308

[4]

[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.00229

[5]

Bae Sung-Ho, 2016, IEEE TIP

[6] Object Detection in Video with Spatiotemporal Sampling Networks [J].

Bertasius, Gedas ;

Torresani, Lorenzo ;

Shi, Jianbo .

COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 :342-357

[7] The Perception-Distortion Tradeoff [J].

Blau, Yochai ;

Michaeli, Tomer .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6228-6237

[8]

Bosse Sebastian, 2017, IEEE TIP

[9] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[10]

Chan Kelvin CK, 2021, AAAI

← 1 2 3 4 5 6 →