Structured Neural Motifs: Scene Graph Parsing via Enhanced Context

被引:3
|
作者
Li, Yiming [1 ,4 ]
Yang, Xiaoshan [2 ,3 ,4 ]
Xu, Changsheng [1 ,2 ,3 ,4 ]
机构
[1] HeFei Univ Technol, Hefei, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
[4] Peng Cheng Lab, Shenzhen, Peoples R China
来源
MULTIMEDIA MODELING (MMM 2020), PT II | 2020年 / 11962卷
基金
中国国家自然科学基金;
关键词
Scene graph; Deep learning; LSTMs;
D O I
10.1007/978-3-030-37734-2_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene graph is one kind of structured representation of the visual content in an image. It is helpful for complex visual understanding tasks such as image captioning, visual question answering and semantic image retrieval. Since the real-world images always have multiple object instances and complex relationships, the context information is extremely important for scene graph generation. It has been noted that the context dependencies among different nodes in the scene graph are asymmetric, which meas it is highly possible to directly predict relationship labels based on object labels but not vice-versa. Based on this finding, the existing motifs network has successfully exploited the context patterns among object nodes and the dependencies between the object nodes and the relation nodes. However, the spatial information and the context dependencies among relation nodes are neglected. In this work, we propose Structured Motif Network (StrcMN) which predicts object labels and pairwise relationships by mining more complete global context features. The experiments show that our model significantly outperforms previous methods on the VRD and Visual Genome datasets.
引用
收藏
页码:175 / 188
页数:14
相关论文
共 50 条
  • [1] Learning to transfer focus of graph neural network for scene graph parsing
    Jiang, Junjie
    He, Zaixing
    Zhang, Shuyou
    Zhao, Xinyue
    Tan, Jianrong
    PATTERN RECOGNITION, 2021, 112
  • [2] Zero-Shot Predicate Prediction for Scene Graph Parsing
    Li, Yiming
    Yang, Xiaoshan
    Huang, Xuhui
    Ma, Zhe
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3140 - 3153
  • [3] Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
    Mi, Jinpeng
    Lyu, Jianzhi
    Tang, Song
    Li, Qingdu
    Zhang, Jianwei
    FRONTIERS IN NEUROROBOTICS, 2020, 14
  • [4] Scene Graph Inference via Multi-Scale Context Modeling
    Xu, Ning
    Liu, An-An
    Wong, Yongkang
    Nie, Weizhi
    Su, Yuting
    Kankanhalli, Mohan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (03) : 1031 - 1041
  • [5] Enhanced Context Learning with Transformer for Human Parsing
    Song, Jingya
    Shi, Qingxuan
    Li, Yihang
    Yang, Fang
    APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [6] ECNet: An Efficient and Context-Aware Network for Street Scene Parsing
    Jiang, Bin
    Tu, Wenxuan
    Yang, Chao
    Xiao, Yi
    2018 9TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP 2018), 2018, : 202 - 210
  • [7] A Graph-based Context Learning Technique for Image Parsing
    Azam, Basim
    Verma, Brijesh
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [8] Boosting Scene Parsing Performance via Reliable Scale Prediction
    Shi, Hengcan
    Li, Hongliang
    Wu, Qingbo
    Meng, Fanman
    Ngan, King N.
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 492 - 500
  • [9] GEBNet: Graph-Enhancement Branch Network for RGB-T Scene Parsing
    Dong, Shaohua
    Zhou, Wujie
    Qian, Xiaohong
    Yu, Lu
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2273 - 2277
  • [10] Graph Neural Network based Scene Change Detection Using Scene Graph Embedding with Hybrid Classification Loss
    Kim, Soyeon
    Joo, Kyung-no
    Youn, Chan-Hyun
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 190 - 195