WFormer: A Transformer-Based Soft Fusion Model for Robust Image Watermarking

被引:3
作者
Luo, Ting [1 ]
Wu, Jun [2 ]
He, Zhouyan [1 ]
Xu, Haiyong [2 ]
Jiang, Gangyi [2 ]
Chang, Chin-Chen [3 ]
机构
[1] Ningbo Univ, Coll Sci & Technol, Ningbo 315212, Peoples R China
[2] Ningbo Univ, Fac Informat Sci & Engn, Ningbo 315211, Peoples R China
[3] Feng Chia Univ, Dept Informat Engn & Comp Sci, Taichung 40724, Taiwan
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2024年
基金
中国国家自然科学基金;
关键词
Watermarking; Feature extraction; Transformers; Decoding; Convolution; Noise; Robustness; transformer; soft fusion; cross-attention;
D O I
10.1109/TETCI.2024.3386916
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most deep neural network (DNN) based image watermarking models often employ the encoder-noise-decoder structure, in which watermark is simply duplicated for expansion and then directly fused with image features to produce the encoded image. However, simple duplication will generate watermark over-redundancies, and the communication between the cover image and watermark in different domains is lacking in image feature extraction and direction fusion, which degrades the watermarking performance. To solve those drawbacks, this paper proposes a Transformer-based soft fusion model for robust image watermarking, namely WFormer. Specifically, to expand watermark effectively, a watermark preprocess module (WPM) is designed with Transformers to extract valid and expanded watermark features by computing its self-attention. Then, to replace direct fusion, a soft fusion module (SFM) is deployed to integrate Transformers into image fusion with watermark by mining their long-range correlations. Precisely, self-attention is computed to extract their own latent features, and meanwhile, cross-attention is learned for bridging their gap to embed watermark effectively. In addition, a feature enhancement module (FEM) builds communication between the cover image and watermark by capturing their cross-feature dependencies, which tunes image features in accordance with watermark features for better fusion. Experimental results show that the proposed WFormer outperforms the existing state-of-the-art watermarking models in terms of invisibility, robustness, and embedding capacity. Furthermore, ablation results prove the effectiveness of the WPM, the FEM, and the SFM.
引用
收藏
页码:4179 / 4196
页数:18
相关论文
共 50 条
  • [1] NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study
    Agustsson, Eirikur
    Timofte, Radu
    [J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 1122 - 1131
  • [2] ReDMark: Framework for residual diffusion watermarking based on deep networks
    Ahmadi, Mahdi
    Norouzi, Alireza
    Karimi, Nader
    Samavi, Shadrokh
    Emami, Ali
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 146 (146)
  • [3] Almohammad A., 2010, 2010 2nd International Conference on Image Processing Theory, Tools and Applications (IPTA 2010), P215, DOI 10.1109/IPTA.2010.5586786
  • [4] [Anonymous], 2011, Proceedings of the 1st Workshop on Big Learning: Algorithms, Systems, and Tools for Learning at Scale
  • [5] Ba Jimmy Lei, 2016, arXiv
  • [6] HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening
    Bandara, Wele Gedara Chaminda
    Patel, Vishal M.
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1757 - 1767
  • [7] Boehm B, 2014, Arxiv, DOI arXiv:1410.6656
  • [8] Choudhary R, 2016, 2016 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION CONTROL AND INTELLIGENT SYSTEMS (CCIS), P120, DOI 10.1109/CCIntelS.2016.7878213
  • [9] MixFormer: End-to-End Tracking with Iterative Mixed Attention
    Cui, Yutao
    Jiang, Cheng
    Wang, Limin
    Wu, Gangshan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13598 - 13608
  • [10] Daren H., 2001, P IEEE INT C MULT EX, P313