Transform Vision Patterns for Multi-Task Pixel Learning

被引:4
作者
Zhang, Xiaoya [1 ]
Zhou, Ling [1 ]
Li, Yong [1 ]
Cui, Zhen [1 ]
Xie, Jin [1 ]
Yang, Jian [1 ]
机构
[1] Nanjing Univ Sci & Technol, Nanjing, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年
基金
中国国家自然科学基金;
关键词
Segmentation; Depth estimation; Multi-task Pixel learning; Visual Transformation;
D O I
10.1145/3474085.3475501
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-task pixel perception is one of the most important topics in the field of machine intelligence. Inspired by the observation of cross-task interdependencies of visual patterns, we propose a multi-task vision pattern transformation (VPT) method to adaptively correlate and transfer cross-task visual patterns by leveraging the powerful transformer mechanism. To better transfer visual patterns, specifically, we build two types of pattern transformation based on the statistic prior that the affinity relations across tasks are correlated. One aims to transfer feature patterns for the integration of different task features; the other aims to exchange structure patterns for mining and leveraging the latent interaction cues. These two types of transformations are encapsulated into two VPT units, which provide universal matching interfaces for multi-task learning, complement each other to guide the transmission of feature/structure patterns, and finally realize an adaptive selection of important patterns across tasks. Extensive experiments on the joint learning of semantic segmentation, depth prediction and surface normal estimation demonstrate that our proposed method is more effective than those baselines and achieve the state-of-that-art performance in three pixel-level visual tasks.
引用
收藏
页码:97 / 106
页数:10
相关论文
共 71 条
[1]  
[Anonymous], 2011, P 17 ACM SIGKDD INT
[2]   Convex multi-task feature learning [J].
Argyriou, Andreas ;
Evgeniou, Theodoros ;
Pontil, Massimiliano .
MACHINE LEARNING, 2008, 73 (03) :243-272
[3]  
Atmaja BT, 2020, INT CONF ACOUST SPEE, P4482, DOI [10.1109/icassp40776.2020.9052916, 10.1109/ICASSP40776.2020.9052916]
[4]   Bi3D: Stereo Depth Estimation via Binary Classifications [J].
Badki, Abhishek ;
Troccoli, Alejandro ;
Kim, Kihwan ;
Kautz, Jan ;
Sen, Pradeep ;
Gallo, Orazio .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :1597-1605
[5]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[6]   Marr Revisited: 2D-3D Alignment via Surface Normal Prediction [J].
Bansal, Aayush ;
Russell, Bryan ;
Gupta, Abhinav .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5965-5974
[7]   Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks [J].
Cao, Yuanzhouhan ;
Wu, Zifeng ;
Shen, Chunhua .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (11) :3174-3182
[8]  
Chen D, 2009, PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE OF MANAGEMENT SCIENCE AND INFORMATION SYSTEM, VOLS 1-4, P1375
[9]  
Chen LC, 2016, Arxiv, DOI [arXiv:1412.7062, DOI 10.48550/ARXIV.1412.7062]
[10]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851