Learning Semantic Alignment Using Global Features and Multi-Scale Confidence

被引:0
|
作者
Xu, Huaiyuan [1 ]
Liao, Jing [2 ]
Liu, Huaping [3 ]
Sun, Yuxiang [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Mech Engn, Kowloon, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[3] Tsinghua Univ, Inst Artificial Intelligence, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Correlation; Feature extraction; Transformers; Training; Task analysis; Probabilistic logic; Semantic alignment; enhancement transformer; probabilistic correlation computation; cross-domain alignment;
D O I
10.1109/TCSVT.2023.3288370
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Semantic alignment aims to establish pixel correspondences between images based on semantic consistency. It can serve as a fundamental component for various downstream computer vision tasks, such as style transfer and exemplar-based colorization, etc. Many existing methods use local features and their cosine similarities to infer semantic alignment. However, they struggle with significant intra-class variation of objects, such as appearance, size, etc. In other words, contents with the same semantics tend to be significantly different in vision. To address this issue, we propose a novel deep neural network of which the core lies in global feature enhancement and adaptive multi-scale inference. Specifically, two modules are proposed: an enhancement transformer for enhancing semantic features with global awareness; a probabilistic correlation module for adaptively fusing multi-scale information based on the learned confidence scores. We use the unified network architecture to achieve two types of semantic alignment, namely, cross-object semantic alignment and cross-domain semantic alignment. Experimental results demonstrate that our method achieves competitive performance on five standard cross-object semantic alignment benchmarks, and outperforms the state of the arts in cross-domain semantic alignment.
引用
收藏
页码:897 / 910
页数:14
相关论文
共 50 条
  • [21] Learning Multi-Scale Knowledge-Guided Features for Text-Guided Face Recognition
    Hasan, Md Mahedi
    Sami, Shoaib Meraj
    Nasrabadi, Nasser M.
    Dawson, Jeremy
    IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2025, 7 (02): : 195 - 209
  • [22] Multi-View Gait Recognition With Joint Local Multi-Scale and Global Contextual Spatio-Temporal Features
    Zhai, Wenzhe
    Li, Haomiao
    Zheng, Chaoqun
    Xing, Xianglei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1123 - 1135
  • [23] Learning to Segment From Scribbles Using Multi-Scale Adversarial Attention Gates
    Valvano, Gabriele
    Leo, Andrea
    Tsaftaris, Sotirios A.
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (08) : 1990 - 2001
  • [24] MSCFNet: A Lightweight Network With Multi-Scale Context Fusion for Real-Time Semantic Segmentation
    Gao, Guangwei
    Xu, Guoan
    Yu, Yi
    Xie, Jin
    Yang, Jian
    Yue, Dong
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) : 25489 - 25499
  • [25] MUSTER: A Multi-Scale Transformer-Based Decoder for Semantic Segmentation
    Xu, Jing
    Shi, Wentao
    Gao, Pan
    Li, Qizhu
    Wang, Zhengwei
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025, 9 (01): : 202 - 212
  • [26] High-Level Semantic Networks for Multi-Scale Object Detection
    Cao, Jiale
    Pang, Yanwei
    Zhao, Shengjie
    Li, Xuelong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (10) : 3372 - 3386
  • [27] MANet: Multi-Scale Attention Network for Correspondence Learning
    Chen, Yukai
    Zheng, Linxin
    Liu, Xin
    Xiao, Guobao
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1978 - 1982
  • [28] Multi-Scale Alignment Domain Adaptation for Ship Classification in Multi-Resolution SAR Images
    Liu, Zhunga
    Li, Kun
    Wang, Longfei
    Zhang, Zuowei
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, : 4051 - 4062
  • [29] PSTNet: Enhanced Polyp Segmentation With Multi-Scale Alignment and Frequency Domain Integration
    Xu, Wenhao
    Xu, Rongtao
    Wang, Changwei
    Li, Xiuli
    Xu, Shibiao
    Guo, Li
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (10) : 6042 - 6053
  • [30] Multi-Scale Structure Perception and Global Context-Aware Method for Small-Scale Pedestrian Detection
    Gao, Hao
    Huang, Shucheng
    Li, Mingxing
    Li, Tian
    IEEE ACCESS, 2024, 12 : 76392 - 76403