MAXIM: Multi-Axis MLP for Image Processing

被引:349
作者
Tu, Zhengzhong [1 ,2 ]
Talebi, Hossein [1 ]
Zhang, Han [1 ]
Yang, Feng [1 ]
Milanfar, Peyman [1 ]
Bovik, Alan [2 ]
Li, Yinxiao [1 ]
机构
[1] Google Res, Austin, TX 78712 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年
关键词
NETWORK;
D O I
10.1109/CVPR52688.2022.00568
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress on Transformers and multilayer perceptron (MLP) models provide new network architectural designs for computer vision tasks. Although these models proved to be effective in many vision tasks such as image recognition, there remain challenges in adapting them for lowlevel vision. The inflexibility to support high-resolution images and limitations of local attention are perhaps the main bottlenecks. In this work, we present a multi-axis MLP based architecture called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. MAXIM uses a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs. Specifically, MAXIM contains two MLP-based building blocks: a multi-axis gated MLP that allows for efficient and scalable spatial mixing of local and global visual cues, and a cross-gating block, an alternative to cross-attention, which accounts for cross-feature conditioning. Both these modules are exclusively based on MLPs, but also benefit from being both global and 'fully-convolutional', two properties that are desirable for image processing. Our extensive experimental results show that the proposed MAXIM model achieves state-of-the-art performance on more than ten benchmarks across a range of image processing tasks, including denoising, deblurring, de raining, dehazing, and enhancement while requiring fewer or comparable numbers of parameters and FLOPs than competitive models. The source code and trained models will be available at https://github.com/google-research/maxim.
引用
收藏
页码:5759 / 5770
页数:12
相关论文
共 115 条
[91]   Uformer: A General U-Shaped Transformer for Image Restoration [J].
Wang, Zhendong ;
Cun, Xiaodong ;
Bao, Jianmin ;
Zhou, Wengang ;
Liu, Jianzhuang ;
Li, Houqiang .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :17662-17672
[92]  
Wei Cui, 2018, 2018 Photonics North (PN), DOI 10.1109/PN.2018.8438843
[93]   CBAM: Convolutional Block Attention Module [J].
Woo, Sanghyun ;
Park, Jongchan ;
Lee, Joon-Young ;
Kweon, In So .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :3-19
[94]   Contrastive Learning for Compact Single Image Dehazing [J].
Wu, Haiyan ;
Qu, Yanyun ;
Lin, Shaohui ;
Zhou, Jian ;
Qiao, Ruizhi ;
Zhang, Zhizhong ;
Xie, Yuan ;
Ma, Lizhuang .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :10546-10555
[95]  
Xie EZ, 2021, ADV NEUR IN, V34
[96]   Unnatural L0 Sparse Representation for Natural Image Deblurring [J].
Xu, Li ;
Zheng, Shicheng ;
Jia, Jiaya .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :1107-1114
[97]   Deep Joint Rain Detection and Removal from a Single Image [J].
Yang, Wenhan ;
Tan, Robby T. ;
Feng, Jiashi ;
Liu, Jiaying ;
Guo, Zongming ;
Yan, Shuicheng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1685-1694
[98]  
Yue ZS, 2019, ADV NEUR IN, V32
[99]   Dual Adversarial Network: Toward Real-World Noise Removal and Noise Generation [J].
Yue, Zongsheng ;
Zhao, Qian ;
Zhang, Lei ;
Meng, Deyu .
COMPUTER VISION - ECCV 2020, PT X, 2020, 12355 :41-58
[100]   Learning Enriched Features for Real Image Restoration and Enhancement [J].
Zamir, Syed Waqas ;
Arora, Aditya ;
Khan, Salman ;
Hayat, Munawar ;
Khan, Fahad Shahbaz ;
Yang, Ming-Hsuan ;
Shao, Ling .
COMPUTER VISION - ECCV 2020, PT XXV, 2020, 12370 :492-511