Learning Semantic Alignment Using Global Features and Multi-Scale Confidence

被引:1
作者
Xu, Huaiyuan [1 ]
Liao, Jing [2 ]
Liu, Huaping [3 ]
Sun, Yuxiang [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Mech Engn, Kowloon, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[3] Tsinghua Univ, Inst Artificial Intelligence, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Correlation; Feature extraction; Transformers; Training; Task analysis; Probabilistic logic; Semantic alignment; enhancement transformer; probabilistic correlation computation; cross-domain alignment;
D O I
10.1109/TCSVT.2023.3288370
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Semantic alignment aims to establish pixel correspondences between images based on semantic consistency. It can serve as a fundamental component for various downstream computer vision tasks, such as style transfer and exemplar-based colorization, etc. Many existing methods use local features and their cosine similarities to infer semantic alignment. However, they struggle with significant intra-class variation of objects, such as appearance, size, etc. In other words, contents with the same semantics tend to be significantly different in vision. To address this issue, we propose a novel deep neural network of which the core lies in global feature enhancement and adaptive multi-scale inference. Specifically, two modules are proposed: an enhancement transformer for enhancing semantic features with global awareness; a probabilistic correlation module for adaptively fusing multi-scale information based on the learned confidence scores. We use the unified network architecture to achieve two types of semantic alignment, namely, cross-object semantic alignment and cross-domain semantic alignment. Experimental results demonstrate that our method achieves competitive performance on five standard cross-object semantic alignment benchmarks, and outperforms the state of the arts in cross-domain semantic alignment.
引用
收藏
页码:897 / 910
页数:14
相关论文
共 52 条
[1]   Demystifying Unsupervised Semantic Correspondence Estimation [J].
Aygun, Mehmet ;
Mac Aodha, Oisin .
COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 :125-142
[2]  
Barnes C, 2010, LECT NOTES COMPUT SC, V6313, P29
[3]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[4]   Large-Scale Structure from Motion with Semantic Constraints of Aerial Images [J].
Chen, Yu ;
Wang, Yao ;
Lu, Peng ;
Chen, Yisong ;
Wang, Guoping .
PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 :347-359
[5]  
Cho Seokju, 2021, Advances in Neural Information Processing Systems, V34
[6]  
Choy CB, 2016, ADV NEUR IN, V29
[7]   Adaptive Disparity Candidates Prediction Network for Efficient Real-Time Stereo Matching [J].
Dai, He ;
Zhang, Xuchong ;
Zhao, Yongli ;
Sun, Hongbin ;
Zheng, Nanning .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) :3099-3110
[8]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]  
Dosovitskiy A., 2021, 9 INT C LEARN REPR I
[10]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338