TIVA-KG: A Multimodal Knowledge Graph with Text, Image, Video and Audio

被引:7
作者
Wang, Xin [1 ]
Meng, Benyuan [2 ]
Chen, Hong [2 ]
Meng, Yuan [2 ]
Lv, Ke [3 ]
Zhu, Wenwu [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, BNRist, Beijing, Peoples R China
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
基金
中国国家自然科学基金;
关键词
Knowledge Graph; Multimodal; Text; Video; Image; Audio;
D O I
10.1145/3581783.3612266
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge graphs serve as a powerful tool to boost model performances for various applications covering computer vision, natural language processing, multimedia data mining, etc. The process of knowledge acquisition for human is multimodal in essence, covering text, image, video and audio modalities. However, existing multimodal knowledge graphs fail to cover all these four elements simultaneously, severely limiting their expressive powers in performance improvement for downstream tasks. In this paper, we propose TIVA-KG, a multimodal Knowledge Graph covering Text, Image, Video and Audio, which can benefit various downstream tasks. Our proposed TIVA-KG has two significant advantages over existing knowledge graphs in i) coverage of up to four modalities including text, image, video, audio, and ii) capability of triplet grounding which grounds multimodal relations to triples instead of entities. We further design a Quadruple Embedding Baseline (QEB) model to validate the necessity and efficacy of considering four modalities in KG. We conduct extensive experiments to test the proposed TIVA-KG with various knowledge graph representation approaches over link prediction task, demonstrating the benefits and necessity of introducing multiple modalities and triplet grounding. TIVA-KG is expected to promote further research on mining multimodal knowledge graph as well as the relevant downstream tasks in the community. TIVA-KG is now available at our website: http://mn.cs.tsinghua.edu.cn/tivakg.
引用
收藏
页码:2391 / 2399
页数:9
相关论文
共 41 条
  • [1] Alberts Houda, 2021, P 1 WORKSH MULT REPR, P138, DOI DOI 10.18653/V1/2021.MRL-1.13
  • [2] [Anonymous], 2013, P 22 INT C WORLD WID, P413, DOI DOI 10.1145/2488388.2488425
  • [3] Balazevic I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P5185
  • [4] Bordes A., 2013, Advances in Neural Information Processing Systems, V26, P1, DOI [10.5555/2999792.2999923, DOI 10.5555/2999792.2999923]
  • [5] The WASABI Dataset: Cultural, Lyrics and Audio Analysis Metadata About 2 Million Popular Commercially Released Songs
    Buffa, Michel
    Cabrio, Elena
    Fell, Michael
    Gandon, Fabien
    Giboin, Alain
    Hennequin, Romain
    Michel, Franck
    Pauwels, Johan
    Pellerin, Guillaume
    Tikat, Maroua
    Winckler, Marco
    [J]. SEMANTIC WEB, ESWC 2021, 2021, 12731 : 515 - 531
  • [6] Ebisu T, 2018, AAAI CONF ARTIF INTE, P1819
  • [7] Ferrada Sebastian, 2017, SEMWEB
  • [8] Galárraga L, 2015, VLDB J, V24, P707, DOI 10.1007/s00778-015-0394-1
  • [9] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [10] Hershey S, 2017, INT CONF ACOUST SPEE, P131, DOI 10.1109/ICASSP.2017.7952132