MORE: A Multimodal Object-Entity Relation Extraction Dataset with a Benchmark Evaluation

被引:0
作者
He, Liang [1 ]
Wang, Hongke [1 ]
Cao, Yongchang [1 ]
Wu, Zhen [1 ]
Zhang, Jianbing [1 ]
Dai, Xinyu [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
基金
中国国家自然科学基金;
关键词
dataset; multimodal; relation extraction; benchmark evaluation;
D O I
10.1145/3581783.3612209
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extracting relational facts from multimodal data is a crucial task in the field of multimedia and knowledge graphs that feeds into widespread real-world applications. The emphasis of recent studies centers on recognizing relational facts in which both entities are present in one modality and supplementary information is used from other modalities. However, such works disregard a substantial amount of multimodal relational facts that arise across different modalities, such as one entity seen in a text and another in an image. In this paper, we propose a new task, namely Multimodal Object-Entity Relation Extraction, which aims to extract "object-entity" relational facts from image and text data. To facilitate research on this task, we introduce MORE, a new dataset comprising 21 relation types and 20,136 multimodal relational facts annotated on 3,522 pairs of textual news titles and corresponding images. To show the challenges of Multimodal Object-Entity Relation Extraction, we evaluated recent state-of-the-art methods for multimodal relation extraction and conducted a comprehensive experimentation analysis on MORE. Our results demonstrate significant challenges for existing methods, underlining the need for further research on this task. Based on our experiments, we identify several promising directions for future research. The MORE dataset and code are available at https://github.com/NJUNLP/MORE.
引用
收藏
页码:4564 / 4573
页数:10
相关论文
共 35 条
  • [1] Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion
    Chen, Xiang
    Zhang, Ningyu
    Li, Lei
    Deng, Shumin
    Tan, Chuanqi
    Xu, Changliang
    Huang, Fei
    Si, Luo
    Chen, Huajun
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 904 - 915
  • [2] S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation
    Chen, Xiaotian
    Wang, Yuwang
    Chen, Xuejin
    Zeng, Wenjun
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3033 - 3042
  • [3] EFFECT OF THE COMPANY RELATIONSHIP NETWORK ON DEFAULT PREDICTION: EVIDENCE FROM CHINESE LISTED COMPANIES
    Chi, Guotai
    Zhou, Ying
    Shen, Long
    Xiong, Jian
    Yan, Hongjia
    [J]. INTERNATIONAL JOURNAL OF THEORETICAL AND APPLIED FINANCE, 2022, 25 (06)
  • [4] A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES
    COHEN, J
    [J]. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) : 37 - 46
  • [5] Doddington G, 2004, P 4 INT C LANG RES E
  • [6] Dosovitskiy A., 2020, PREPRINT
  • [7] An Empirical Study of Training End-to-End Vision-and-Language Transformers
    Dou, Zi-Yi
    Xu, Yichong
    Gan, Zhe
    Wang, Jianfeng
    Wang, Shuohang
    Wang, Lijuan
    Zhu, Chenguang
    Zhang, Pengchuan
    Yuan, Lu
    Peng, Nanyun
    Liu, Zicheng
    Zeng, Michael
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18145 - 18155
  • [8] IMGpedia: A Linked Dataset with Content-Based Analysis of Wikimedia Images
    Ferrada, Sebastian
    Bustos, Benjamin
    Hogan, Aidan
    [J]. SEMANTIC WEB - ISWC 2017, PT II, 2017, 10588 : 84 - 93
  • [9] Ferrada Sebastian, 2017, P ISWC 2017 DEM IND, V1963
  • [10] Li Lei, 2022, ABS221107504 CORR, DOI [10.48550/arXiv.2211.07504, DOI 10.48550/ARXIV.2211.07504]