CroMIC-QA: The Cross-Modal Information Complementation Based Question Answering

被引:0
|
作者
Qian, Shun [1 ]
Liu, Bingquan [1 ]
Sun, Chengjie [1 ]
Xu, Zhen [1 ]
Ma, Lin [2 ]
Wang, Baoxun [3 ]
机构
[1] Harbin Inst Technol, Fac Comp, Harbin 150001, Peoples R China
[2] Meituan Inc, Beijing 100091, Peoples R China
[3] Tencent Co Ltd, Beijing 100091, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Visualization; Semantics; Crops; Question answering (information retrieval); Diseases; Linguistics; Cross-modal semantic interaction; visual question answering; domain-specific datasets; multi-modal tasks;
D O I
10.1109/TMM.2023.3326616
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new multi-modal question -answering task, named as Cross-Modal Information Complementation based Question Answering (CroMIC-QA), to promote the exploration on bridging the semantic gap between visual and linguistic signals. The proposed task is inspired by the common phenomenon that, in most user-generated QA scenarios, the information of the given textual question is incomplete, and thus it is required to merge the semantics of both the text and the accompanying image to infer the complete real question. In this work, the CroMIC-QA task is first formally defined and compared with the classic Visual Question Answering (VQA) task. On this basis, a specified dataset, CroMIC-QA-Agri, is collected from an online QA community in the agriculture domain for the proposed task. A group of experiments is conducted on this dataset, with the typical multi-modal deep architectures implemented and compared. The experimental results show that the appropriate text/image presentations and text-image semantic interaction methods are effective to improve the performance of the framework.
引用
收藏
页码:8348 / 8359
页数:12
相关论文
共 50 条
  • [1] Cross-Modal Retrieval for Knowledge-Based Visual Question Answering
    Lerner, Paul
    Ferret, Olivier
    Guinaudeau, Camille
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 421 - 438
  • [2] Cross-modal knowledge reasoning for knowledge-based visual question answering
    Yu, Jing
    Zhu, Zihao
    Wang, Yujing
    Zhang, Weifeng
    Hu, Yue
    Tan, Jianlong
    PATTERN RECOGNITION, 2020, 108
  • [3] Lightweight recurrent cross-modal encoder for video question answering
    Immanuel, Steve Andreas
    Jeong, Cheol
    KNOWLEDGE-BASED SYSTEMS, 2023, 276
  • [4] Cross-Modal Visual Question Answering for Remote Sensing Data
    Felix, Rafael
    Repasky, Boris
    Hodge, Samuel
    Zolfaghari, Reza
    Abbasnejad, Ehsan
    Sherrah, Jamie
    2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021), 2021, : 57 - 65
  • [5] Cross-modal Relational Reasoning Network for Visual Question Answering
    Chen, Hongyu
    Liu, Ruifang
    Peng, Bo
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3939 - 3948
  • [6] Visual question answering with attention transfer and a cross-modal gating mechanism
    Li, Wei
    Sun, Jianhui
    Liu, Ge
    Zhao, Linglan
    Fang, Xiangzhong
    PATTERN RECOGNITION LETTERS, 2020, 133 (133) : 334 - 340
  • [7] Gated Multi-modal Fusion with Cross-modal Contrastive Learning for Video Question Answering
    Lyu, Chenyang
    Li, Wenxi
    Ji, Tianbo
    Zhou, Liting
    Gurrin, Cathal
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 427 - 438
  • [8] Medical visual question answering with symmetric interaction attention and cross-modal gating
    Chen, Zhi
    Zou, Beiji
    Dai, Yulan
    Zhu, Chengzhang
    Kong, Guilan
    Zhang, Wensheng
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85
  • [9] Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering
    Reichman, Benjamin
    Heck, Larry
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2829 - 2834
  • [10] Jointly Learning Attentions with Semantic Cross-Modal Correlation for Visual Question Answering
    Cao, Liangfu
    Gao, Lianli
    Song, Jingkuan
    Xu, Xing
    Shen, Heng Tao
    DATABASES THEORY AND APPLICATIONS, ADC 2017, 2017, 10538 : 248 - 260