Learning Semantic Alignment Using Global Features and Multi-Scale Confidence

被引：0

作者：

Xu, Huaiyuan ^{[1
]}

Liao, Jing ^{[2
]}

Liu, Huaping ^{[3
]}

Sun, Yuxiang ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Mech Engn, Kowloon, Hong Kong, Peoples R China

[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China

[3] Tsinghua Univ, Inst Artificial Intelligence, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Semantics; Correlation; Feature extraction; Transformers; Training; Task analysis; Probabilistic logic; Semantic alignment; enhancement transformer; probabilistic correlation computation; cross-domain alignment;

D O I：

10.1109/TCSVT.2023.3288370

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Semantic alignment aims to establish pixel correspondences between images based on semantic consistency. It can serve as a fundamental component for various downstream computer vision tasks, such as style transfer and exemplar-based colorization, etc. Many existing methods use local features and their cosine similarities to infer semantic alignment. However, they struggle with significant intra-class variation of objects, such as appearance, size, etc. In other words, contents with the same semantics tend to be significantly different in vision. To address this issue, we propose a novel deep neural network of which the core lies in global feature enhancement and adaptive multi-scale inference. Specifically, two modules are proposed: an enhancement transformer for enhancing semantic features with global awareness; a probabilistic correlation module for adaptively fusing multi-scale information based on the learned confidence scores. We use the unified network architecture to achieve two types of semantic alignment, namely, cross-object semantic alignment and cross-domain semantic alignment. Experimental results demonstrate that our method achieves competitive performance on five standard cross-object semantic alignment benchmarks, and outperforms the state of the arts in cross-domain semantic alignment.

引用

页码：897 / 910

页数：14

共 50 条

[41] Learning a Deep Multi-Scale Feature Ensemble and an Edge-Attention Guidance for Image Fusion
Liu, Jinyuan
Fan, Xin
Jiang, Ji
Liu, Risheng
Luo, Zhongxuan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 105 - 119
[42] Scale-space theory-based multi-scale features for aircraft classification using HRRP
Liu, Jia
Fang, Ning
Xie, Yong Jun
Wang, Bao Fa
ELECTRONICS LETTERS, 2016, 52 (06) : 475 - 476
[43] Prompt-Based Learning for Image Variation Using Single Image Multi-Scale Diffusion Models
Park, Jiwon
Jeong, Dasol
Lee, Hyebean
Han, Seunghee
Paik, Joonki
IEEE ACCESS, 2024, 12 : 158810 - 158823
[44] Text-to-Image Vehicle Re-Identification: Multi-Scale Multi-View Cross-Modal Alignment Network and a Unified Benchmark
Ding, Leqi
Liu, Lei
Huang, Yan
Li, Chenglong
Zhang, Cheng
Wang, Wei
Wang, Liang
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (07) : 7673 - 7686
[45] Towards a multi-scale semantic characterization of the built heritage: From the column to the urban scale
Lo Buglio, David
Van Dongen, Alexandre
DISEGNARECON, 2021, 14 (26) : 302 - 3.16
[46] Classification and Quantification of Emphysema Using a Multi-Scale Residual Network
Peng, Liying
Lin, Lanfen
Hu, Hongjie
Li, Huali
Chen, Qingqing
Ling, Xiaoli
Wang, Dan
Han, Xianhua
Iwamoto, Yutaro
Chen, Yen-wei
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (06) : 2526 - 2536
[47] A Robust Image Semantic Communication System With Multi-Scale Vision Transformer
Peng, Xiang
Qin, Zhijin
Tao, Xiaoming
Lu, Jianhua
Letaief, Khaled B.
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2025, 43 (04) : 1278 - 1291
[48] Multi-Task Learning Model Based on Multi-Scale CNN and LSTM for Sentiment Classification
Jin, Ning
Wu, Jiaxian
Ma, Xiang
Yan, Ke
Mo, Yuchang
IEEE ACCESS, 2020, 8 : 77060 - 77072
[49] Multi-scale Evaluation of Texture Features for Salt Dome Detection
Ferreira, Rodrigo da Silva
Mattos, Andrea Britto
Brazil, Emilio Vital
Cerqueira, Renato
Ferraz, Marco
Cersosimo, Sergio
PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 632 - 635
[50] Small-Size Target Detection in Remotely Sensed Image Using Improved Multi-Scale Features and Attention Mechanism
Zhao, Hu
Chu, Kaibin
Zhang, Ji
Feng, Chengtao
IEEE ACCESS, 2023, 11 : 56703 - 56711

← 1 2 3 4 5 →