Learning Semantic Alignment Using Global Features and Multi-Scale Confidence

被引:0
作者
Xu, Huaiyuan [1 ]
Liao, Jing [2 ]
Liu, Huaping [3 ]
Sun, Yuxiang [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Mech Engn, Kowloon, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[3] Tsinghua Univ, Inst Artificial Intelligence, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Correlation; Feature extraction; Transformers; Training; Task analysis; Probabilistic logic; Semantic alignment; enhancement transformer; probabilistic correlation computation; cross-domain alignment;
D O I
10.1109/TCSVT.2023.3288370
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Semantic alignment aims to establish pixel correspondences between images based on semantic consistency. It can serve as a fundamental component for various downstream computer vision tasks, such as style transfer and exemplar-based colorization, etc. Many existing methods use local features and their cosine similarities to infer semantic alignment. However, they struggle with significant intra-class variation of objects, such as appearance, size, etc. In other words, contents with the same semantics tend to be significantly different in vision. To address this issue, we propose a novel deep neural network of which the core lies in global feature enhancement and adaptive multi-scale inference. Specifically, two modules are proposed: an enhancement transformer for enhancing semantic features with global awareness; a probabilistic correlation module for adaptively fusing multi-scale information based on the learned confidence scores. We use the unified network architecture to achieve two types of semantic alignment, namely, cross-object semantic alignment and cross-domain semantic alignment. Experimental results demonstrate that our method achieves competitive performance on five standard cross-object semantic alignment benchmarks, and outperforms the state of the arts in cross-domain semantic alignment.
引用
收藏
页码:897 / 910
页数:14
相关论文
共 50 条
  • [41] Learning a Deep Multi-Scale Feature Ensemble and an Edge-Attention Guidance for Image Fusion
    Liu, Jinyuan
    Fan, Xin
    Jiang, Ji
    Liu, Risheng
    Luo, Zhongxuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 105 - 119
  • [42] Scale-space theory-based multi-scale features for aircraft classification using HRRP
    Liu, Jia
    Fang, Ning
    Xie, Yong Jun
    Wang, Bao Fa
    ELECTRONICS LETTERS, 2016, 52 (06) : 475 - 476
  • [43] Prompt-Based Learning for Image Variation Using Single Image Multi-Scale Diffusion Models
    Park, Jiwon
    Jeong, Dasol
    Lee, Hyebean
    Han, Seunghee
    Paik, Joonki
    IEEE ACCESS, 2024, 12 : 158810 - 158823
  • [44] Text-to-Image Vehicle Re-Identification: Multi-Scale Multi-View Cross-Modal Alignment Network and a Unified Benchmark
    Ding, Leqi
    Liu, Lei
    Huang, Yan
    Li, Chenglong
    Zhang, Cheng
    Wang, Wei
    Wang, Liang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (07) : 7673 - 7686
  • [45] Towards a multi-scale semantic characterization of the built heritage: From the column to the urban scale
    Lo Buglio, David
    Van Dongen, Alexandre
    DISEGNARECON, 2021, 14 (26) : 302 - 3.16
  • [46] Classification and Quantification of Emphysema Using a Multi-Scale Residual Network
    Peng, Liying
    Lin, Lanfen
    Hu, Hongjie
    Li, Huali
    Chen, Qingqing
    Ling, Xiaoli
    Wang, Dan
    Han, Xianhua
    Iwamoto, Yutaro
    Chen, Yen-wei
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (06) : 2526 - 2536
  • [47] A Robust Image Semantic Communication System With Multi-Scale Vision Transformer
    Peng, Xiang
    Qin, Zhijin
    Tao, Xiaoming
    Lu, Jianhua
    Letaief, Khaled B.
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2025, 43 (04) : 1278 - 1291
  • [48] Multi-Task Learning Model Based on Multi-Scale CNN and LSTM for Sentiment Classification
    Jin, Ning
    Wu, Jiaxian
    Ma, Xiang
    Yan, Ke
    Mo, Yuchang
    IEEE ACCESS, 2020, 8 : 77060 - 77072
  • [49] Multi-scale Evaluation of Texture Features for Salt Dome Detection
    Ferreira, Rodrigo da Silva
    Mattos, Andrea Britto
    Brazil, Emilio Vital
    Cerqueira, Renato
    Ferraz, Marco
    Cersosimo, Sergio
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 632 - 635
  • [50] Small-Size Target Detection in Remotely Sensed Image Using Improved Multi-Scale Features and Attention Mechanism
    Zhao, Hu
    Chu, Kaibin
    Zhang, Ji
    Feng, Chengtao
    IEEE ACCESS, 2023, 11 : 56703 - 56711