MULTI-GRANULARITY REASONING FOR SOCIAL RELATION RECOGNITION FROM IMAGES

被引:37
作者
Zhang, Meng [1 ,3 ]
Liu, Xinchen [2 ]
Liu, Wu [2 ]
Zhou, Anfu [1 ]
Ma, Huadong [1 ]
Mei, Tao [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing Key Lab Intelligent Telecommunicat Softwa, Beijing 100876, Peoples R China
[2] JD AI Res, JD Com, Beijing 100101, Peoples R China
[3] JD AI Res, Beijing, Peoples R China
来源
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME) | 2019年
基金
中国国家自然科学基金;
关键词
Social Relation Recognition; Multi-Granularity Reasoning; Pose-Guided Graph; Graph Convolutional Network;
D O I
10.1109/ICME.2019.00279
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Discovering social relations in images can make machines better interpret the behavior of human beings. However, automatically recognizing social relations in images is a challenging task due to the significant gap between the domains of visual content and social relation. Existing studies separately process various features such as faces expressions, body appearance, and contextual objects, thus they cannot comprehensively capture the multi-granularity semantics, such as scenes, regional cues of persons, and interactions among persons and objects. To bridge the domain gap, we propose a Multi-Granularity Reasoning framework for social relation recognition from images. The global knowledge and mid-level details are learned from the whole scene and the regions of persons and objects, respectively. Most importantly, we explore the fine-granularity pose keypoints of persons to discover the interactions among persons and objects. Specifically, the pose-guided Person-Object Graph and Person-Pose Graph are proposed to model the actions from persons to object and the interactions between paired persons, respectively. Based on the graphs, social relation reasoning is performed by graph convolutional networks. Finally, the global features and reasoned knowledge are integrated as a comprehensive representation for social relation recognition. Extensive experiments on two public datasets show the effectiveness of the proposed framework.
引用
收藏
页码:1618 / 1623
页数:6
相关论文
共 28 条
[1]   Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition [J].
Bagautdinov, Timur ;
Alahi, Alexandre ;
Fleuret, Francois ;
Fua, Pascal ;
Savarese, Silvio .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3425-3434
[2]   Acquisition of the algorithms of social life: A domain-based approach [J].
Bugental, DB .
PSYCHOLOGICAL BULLETIN, 2000, 126 (02) :187-219
[3]   Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution [J].
Chu, Lingyang ;
Hu, Xia ;
Hu, Juhua ;
Wang, Lanjun ;
Pei, Jian .
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, :1244-1253
[4]   Effective Multimodality Fusion Framework for Cross-Media Topic Detection [J].
Chu, Lingyang ;
Zhang, Yanyan ;
Li, Guorong ;
Wang, Shuhui ;
Zhang, Weigang ;
Huang, Qingming .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2016, 26 (03) :556-569
[5]  
Ding L, 2011, IEEE I CONF COMP VIS, P699, DOI 10.1109/ICCV.2011.6126306
[6]   Learning Attributes Equals Multi-Source Domain Generalization [J].
Gan, Chuang ;
Yang, Tianbao ;
Gong, Boqing .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :87-97
[7]   Recognizing an Action Using Its Name: A Knowledge-Based Approach [J].
Gan, Chuang ;
Yang, Yi ;
Zhu, Linchao ;
Zhao, Deli ;
Zhuang, Yueting .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 120 (01) :61-77
[8]  
Gan C, 2015, PROC CVPR IEEE, P2568, DOI 10.1109/CVPR.2015.7298872
[9]  
He K., 2016, CVPR, DOI [10.1109/CVPR.2016.90, DOI 10.1109/CVPR.2016.90]
[10]  
Kipf T N, 2016, ICLR