Mamba-GIE: A visual state space models-based generalized image extrapolation method via dual-level adaptive feature fusion

被引:0
|
作者
Zhang, Ruoyi [1 ]
Li, Guotao [1 ]
Qu, Shuyi [1 ,2 ,3 ]
Wang, Jun [1 ,2 ,3 ]
Peng, Jinye [1 ,2 ,3 ]
机构
[1] Northwest Univ, Coll Informat Sci & Technol, Xian 710127, Shannxi, Peoples R China
[2] State Prov Joint Engn & Res Ctr Adv Networking & I, Xian 710127, Shannxi, Peoples R China
[3] Shaanxi Key Lab Higher Educ Inst Generat Artificia, Xian 710127, Shannxi, Peoples R China
关键词
Image extrapolation; Visual state space models; Generative adversarial networks; Feature fusion; Attention mechanism;
D O I
10.1016/j.eswa.2024.125961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generalized Image Extrapolation is an image generation sub-task and a challenging ill-posed problem. This task intends to predict unknown regions based on the center area. Unfortunately, existing methods encounter the triple dilemma: (1) Convolutional Neural Networks (CNNs)-based methods can precisely extract local details but underperform in capturing global semantic information due to inductive bias, resulting in a lack of consistency in the image structure and layout. (2) Vision Transformer (ViT)-based methods, although superior to global information extraction, are not sufficiently fine-grained in detail and texture generation, and (3) ViTbased approaches rely on the self-attention mechanism, which leads to a tremendous computational burden in processing images and makes model training inefficient. We propose a novel model named Mamba-GIE, designed to effectively balance information of different granularities and address the unresolved challenges in GIE tasks. At the macro level, Mamba-GIE adopts a U-shaped encoder-decoder architecture, with its core basic block being the improved Hybrid State Space Models (Hybrid-SSMs). Specifically, within the basic blocks, the input feature map is processed via two parallel branches: (1) Extracting global information via the Mamba branch and (2) Handling local details using the CNNs branch. At the micro level, we introduce the dual-level adaptive feature fusion mechanism to achieve adaptive feature fusion in intra- and inter-HybridSSMs blocks. Extensive experiments on three public datasets demonstrate that our approach outperforms existing GIE methods inmost evaluation metrics and image generation quality. Comprehensive ablation studies and resource consumption assessments further reveal the efficiency and effectiveness of Mamba-GIE. Code: https://github.com/zrymsm/Mamba-GIE.
引用
收藏
页数:18
相关论文
empty
未找到相关数据