MORE: A Multimodal Object-Entity Relation Extraction Dataset with a Benchmark Evaluation

被引：0

作者：

He, Liang ^{[1
]}

Wang, Hongke ^{[1
]}

Cao, Yongchang ^{[1
]}

Wu, Zhen ^{[1
]}

Zhang, Jianbing ^{[1
]}

Dai, Xinyu ^{[1
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

dataset; multimodal; relation extraction; benchmark evaluation;

D O I：

10.1145/3581783.3612209

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Extracting relational facts from multimodal data is a crucial task in the field of multimedia and knowledge graphs that feeds into widespread real-world applications. The emphasis of recent studies centers on recognizing relational facts in which both entities are present in one modality and supplementary information is used from other modalities. However, such works disregard a substantial amount of multimodal relational facts that arise across different modalities, such as one entity seen in a text and another in an image. In this paper, we propose a new task, namely Multimodal Object-Entity Relation Extraction, which aims to extract "object-entity" relational facts from image and text data. To facilitate research on this task, we introduce MORE, a new dataset comprising 21 relation types and 20,136 multimodal relational facts annotated on 3,522 pairs of textual news titles and corresponding images. To show the challenges of Multimodal Object-Entity Relation Extraction, we evaluated recent state-of-the-art methods for multimodal relation extraction and conducted a comprehensive experimentation analysis on MORE. Our results demonstrate significant challenges for existing methods, underlining the need for further research on this task. Based on our experiments, we identify several promising directions for future research. The MORE dataset and code are available at https://github.com/NJUNLP/MORE.

引用

页码：4564 / 4573

页数：10

共 35 条

[1] Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion
Chen, Xiang
Zhang, Ningyu
Li, Lei
Deng, Shumin
Tan, Chuanqi
Xu, Changliang
Huang, Fei
Si, Luo
Chen, Huajun
[J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 904 - 915
[2] S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation
Chen, Xiaotian
Wang, Yuwang
Chen, Xuejin
Zeng, Wenjun
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3033 - 3042
[3] EFFECT OF THE COMPANY RELATIONSHIP NETWORK ON DEFAULT PREDICTION: EVIDENCE FROM CHINESE LISTED COMPANIES
Chi, Guotai
Zhou, Ying
Shen, Long
Xiong, Jian
Yan, Hongjia
[J]. INTERNATIONAL JOURNAL OF THEORETICAL AND APPLIED FINANCE, 2022, 25 (06)
[4] A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES
COHEN, J
[J]. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) : 37 - 46
[5] Doddington G, 2004, P 4 INT C LANG RES E
[6] Dosovitskiy A., 2020, PREPRINT
[7] An Empirical Study of Training End-to-End Vision-and-Language Transformers
Dou, Zi-Yi
Xu, Yichong
Gan, Zhe
Wang, Jianfeng
Wang, Shuohang
Wang, Lijuan
Zhu, Chenguang
Zhang, Pengchuan
Yuan, Lu
Peng, Nanyun
Liu, Zicheng
Zeng, Michael
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18145 - 18155
[8] IMGpedia: A Linked Dataset with Content-Based Analysis of Wikimedia Images
Ferrada, Sebastian
Bustos, Benjamin
Hogan, Aidan
[J]. SEMANTIC WEB - ISWC 2017, PT II, 2017, 10588 : 84 - 93
[9] Ferrada Sebastian, 2017, P ISWC 2017 DEM IND, V1963
[10] Li Lei, 2022, ABS221107504 CORR, DOI [10.48550/arXiv.2211.07504, DOI 10.48550/ARXIV.2211.07504]

← 1 2 3 4 →