Soft Transfer Learning via Gradient Diagnosis for Visual Relationship Detection

被引：12

作者：

Chen, Diqi ^{[1
,2
,3
,4
]}

Liang, Xiaodan ^{[5
]}

Wang, Yizhou ^{[3
,4
]}

Gao, Wen ^{[3
,4
]}

机构：

[1] Chinese Acad Sci, Key Lab Intelligent Informat Proc, Inst Comp Technol, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] Peking Univ, Natl Engn Lab Video Technol, Cooperat Medianet Innovat Ctr, Beijing, Peoples R China

[4] Peking Univ, Key Lab Machine Percept MoE, Sch Elect Engn & Comp Sci, Beijing, Peoples R China

[5] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Guangzhou, Guangdong, Peoples R China

来源：

2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2019年

关键词：

D O I：

10.1109/WACV.2019.00124

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Detecting all visual relationships (e.g. "person-wear-shirt") is posed as the most fundamental task towards the ultimate semantic reasoning. However, due to the rich context embedded in the image and diverse language ambiguities (e.g. person vs. man), it is unrealistic to annotate all possible relationships for providing a noise-free supervised setting. All prior approaches simply adopt the traditional fully-supervised detection pipeline and ignore the effect of incomplete annotations on model convergence, resulting in the unstable optimization and unsatisfactory performance. In this work, we make the first attempt to address this critical incomplete annotations issue and reformulate this task via the Soft Transfer Learning (STL), which aims to transfer knowledge learned from the annotations in hand into the uncertain pairs in a self-supervised way. The knowledge transfer process is inferred from a principled gradient diagnosis. Extensive experiments on VRD and the large-scale VG benchmarks demonstrate the superiority of our STL method.

引用

页码：1118 / 1126

页数：9

共 29 条

[1]

[Anonymous], 2017, ARXIV170306246

[2]

[Anonymous], 2017, ARXIV171204440

[3] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[4]

Chen X., 2017, arXiv preprint arXiv:170202138

[5] Detecting Visual Relationships with Deep Relational Networks [J].

Dai, Bo ;

Zhang, Yuqi ;

Lin, Dahua .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3298-3308

[6] Multi-task Self-Supervised Visual Learning [J].

Doersch, Carl ;

Zisserman, Andrew .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2070-2079

[7]

Elkan C., 2008, P 14 ACM SIGKDD INT, P213, DOI DOI 10.1145/1401890.1401920

[8] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[9]

Jiang J., 2007, P ACL, P264

[10] Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations [J].

Krishna, Ranjay ;

Zhu, Yuke ;

Groth, Oliver ;

Johnson, Justin ;

Hata, Kenji ;

Kravitz, Joshua ;

Chen, Stephanie ;

Kalantidis, Yannis ;

Li, Li-Jia ;

Shamma, David A. ;

Bernstein, Michael S. ;

Li Fei-Fei .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) :32-73

← 1 2 3 →