Learning Semantic Alignment Using Global Features and Multi-Scale Confidence

被引:0
|
作者
Xu, Huaiyuan [1 ]
Liao, Jing [2 ]
Liu, Huaping [3 ]
Sun, Yuxiang [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Mech Engn, Kowloon, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[3] Tsinghua Univ, Inst Artificial Intelligence, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Correlation; Feature extraction; Transformers; Training; Task analysis; Probabilistic logic; Semantic alignment; enhancement transformer; probabilistic correlation computation; cross-domain alignment;
D O I
10.1109/TCSVT.2023.3288370
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Semantic alignment aims to establish pixel correspondences between images based on semantic consistency. It can serve as a fundamental component for various downstream computer vision tasks, such as style transfer and exemplar-based colorization, etc. Many existing methods use local features and their cosine similarities to infer semantic alignment. However, they struggle with significant intra-class variation of objects, such as appearance, size, etc. In other words, contents with the same semantics tend to be significantly different in vision. To address this issue, we propose a novel deep neural network of which the core lies in global feature enhancement and adaptive multi-scale inference. Specifically, two modules are proposed: an enhancement transformer for enhancing semantic features with global awareness; a probabilistic correlation module for adaptively fusing multi-scale information based on the learned confidence scores. We use the unified network architecture to achieve two types of semantic alignment, namely, cross-object semantic alignment and cross-domain semantic alignment. Experimental results demonstrate that our method achieves competitive performance on five standard cross-object semantic alignment benchmarks, and outperforms the state of the arts in cross-domain semantic alignment.
引用
收藏
页码:897 / 910
页数:14
相关论文
共 50 条
  • [1] Multi-Scale Convolutional Features Network for Semantic Segmentation in Indoor Scenes
    Wang, Yanran
    Chen, Qingliang
    Chen, Shilang
    Wu, Junjun
    IEEE ACCESS, 2020, 8 : 89575 - 89583
  • [2] Multi-Scale Metric Learning for Few-Shot Learning
    Jiang, Wen
    Huang, Kai
    Geng, Jie
    Deng, Xinyang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (03) : 1091 - 1102
  • [3] Global-Local Interplay in Semantic Alignment for Few-Shot Learning
    Hao, Fusheng
    He, Fengxiang
    Cheng, Jun
    Tao, Dacheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4351 - 4363
  • [4] Deep Dense Multi-Scale Network for Snow Removal Using Semantic and Depth Priors
    Zhang, Kaihao
    Li, Rongqing
    Yu, Yanjiang
    Luo, Wenhan
    Li, Changsheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 7419 - 7431
  • [5] Learning Motion-Guided Multi-Scale Memory Features for Video Shadow Detection
    Lin, Junhao
    Shen, Jiaxing
    Yang, Xin
    Fu, Huazhu
    Zhang, Qing
    Li, Ping
    Sheng, Bin
    Wang, Liansheng
    Zhu, Lei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12288 - 12300
  • [6] Learning Deep Global Multi-Scale and Local Attention Features for Facial Expression Recognition in the Wild
    Zhao, Zengqun
    Liu, Qingshan
    Wang, Shanmin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6544 - 6556
  • [7] Multi-Scale Semantic Map Distillation for Lightweight Pavement Crack Detection
    Wang, Xin
    Mao, Zhaoyong
    Liang, Zhiwei
    Shen, Junge
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (10) : 15081 - 15093
  • [8] SGUIE-Net: Semantic Attention Guided Underwater Image Enhancement With Multi-Scale Perception
    Qi, Qi
    Li, Kunqian
    Zheng, Haiyong
    Gao, Xiang
    Hou, Guojia
    Sun, Kun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6816 - 6830
  • [9] Papillary Thyroid Carcinoma Semantic Segmentation Using Multi-Scale Adaptive Convolutional Network With Dual Decoders
    Payatsuporn, Thanat
    Kantavat, Pittipol
    Tangnuntachai, Nichthida
    Tipparawong, Nopporn
    Techapapa, Waratchanok
    Kijsirikul, Boonserm
    Keelawat, Somboon
    IEEE ACCESS, 2025, 13 : 17340 - 17353
  • [10] From Global to Local: Multi-Patch and Multi-Scale Contrastive Similarity Learning for Unsupervised Defocus Blur Detection
    Li, Jinxing
    Liang, Beicheng
    Lu, Xiangwei
    Li, Mu
    Lu, Guangming
    Xu, Yong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1158 - 1169