Learning Semantic Alignment Using Global Features and Multi-Scale Confidence

被引：1

作者：

Xu, Huaiyuan ^{[1
]}

Liao, Jing ^{[2
]}

Liu, Huaping ^{[3
]}

Sun, Yuxiang ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Mech Engn, Kowloon, Hong Kong, Peoples R China

[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China

[3] Tsinghua Univ, Inst Artificial Intelligence, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Semantics; Correlation; Feature extraction; Transformers; Training; Task analysis; Probabilistic logic; Semantic alignment; enhancement transformer; probabilistic correlation computation; cross-domain alignment;

D O I：

10.1109/TCSVT.2023.3288370

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Semantic alignment aims to establish pixel correspondences between images based on semantic consistency. It can serve as a fundamental component for various downstream computer vision tasks, such as style transfer and exemplar-based colorization, etc. Many existing methods use local features and their cosine similarities to infer semantic alignment. However, they struggle with significant intra-class variation of objects, such as appearance, size, etc. In other words, contents with the same semantics tend to be significantly different in vision. To address this issue, we propose a novel deep neural network of which the core lies in global feature enhancement and adaptive multi-scale inference. Specifically, two modules are proposed: an enhancement transformer for enhancing semantic features with global awareness; a probabilistic correlation module for adaptively fusing multi-scale information based on the learned confidence scores. We use the unified network architecture to achieve two types of semantic alignment, namely, cross-object semantic alignment and cross-domain semantic alignment. Experimental results demonstrate that our method achieves competitive performance on five standard cross-object semantic alignment benchmarks, and outperforms the state of the arts in cross-domain semantic alignment.

引用

页码：897 / 910

页数：14

共 52 条

[1] Demystifying Unsupervised Semantic Correspondence Estimation [J].

Aygun, Mehmet ;

Mac Aodha, Oisin .

COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 :125-142

[2]

Barnes C, 2010, LECT NOTES COMPUT SC, V6313, P29

[3] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[4] Large-Scale Structure from Motion with Semantic Constraints of Aerial Images [J].

Chen, Yu ;

Wang, Yao ;

Lu, Peng ;

Chen, Yisong ;

Wang, Guoping .

PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 :347-359

[5]

Cho Seokju, 2021, Advances in Neural Information Processing Systems, V34

[6]

Choy CB, 2016, ADV NEUR IN, V29

[7] Adaptive Disparity Candidates Prediction Network for Efficient Real-Time Stereo Matching [J].

Dai, He ;

Zhang, Xuchong ;

Zhao, Yongli ;

Sun, Hongbin ;

Zheng, Nanning .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) :3099-3110

[8]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[9]

Dosovitskiy A., 2021, 9 INT C LEARN REPR I

[10] The Pascal Visual Object Classes (VOC) Challenge [J].

Everingham, Mark ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338

← 1 2 3 4 5 6 →