CroMIC-QA: The Cross-Modal Information Complementation Based Question Answering

被引:0
|
作者
Qian, Shun [1 ]
Liu, Bingquan [1 ]
Sun, Chengjie [1 ]
Xu, Zhen [1 ]
Ma, Lin [2 ]
Wang, Baoxun [3 ]
机构
[1] Harbin Inst Technol, Fac Comp, Harbin 150001, Peoples R China
[2] Meituan Inc, Beijing 100091, Peoples R China
[3] Tencent Co Ltd, Beijing 100091, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Visualization; Semantics; Crops; Question answering (information retrieval); Diseases; Linguistics; Cross-modal semantic interaction; visual question answering; domain-specific datasets; multi-modal tasks;
D O I
10.1109/TMM.2023.3326616
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new multi-modal question -answering task, named as Cross-Modal Information Complementation based Question Answering (CroMIC-QA), to promote the exploration on bridging the semantic gap between visual and linguistic signals. The proposed task is inspired by the common phenomenon that, in most user-generated QA scenarios, the information of the given textual question is incomplete, and thus it is required to merge the semantics of both the text and the accompanying image to infer the complete real question. In this work, the CroMIC-QA task is first formally defined and compared with the classic Visual Question Answering (VQA) task. On this basis, a specified dataset, CroMIC-QA-Agri, is collected from an online QA community in the agriculture domain for the proposed task. A group of experiments is conducted on this dataset, with the typical multi-modal deep architectures implemented and compared. The experimental results show that the appropriate text/image presentations and text-image semantic interaction methods are effective to improve the performance of the framework.
引用
收藏
页码:8348 / 8359
页数:12
相关论文
共 50 条
  • [41] GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition
    Li, Jiang
    Wang, Xiaoping
    Lv, Guoqing
    Zeng, Zhigang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 77 - 89
  • [42] Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering
    Liu, Gang
    He, Jinlong
    Li, Pengfei
    Zhao, Zixu
    Zhong, Shenjun
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 160
  • [43] Robust visual question answering via semantic cross modal augmentation
    Mashrur, Akib
    Luo, Wei
    Zaidi, Nayyar A.
    Robles-Kelly, Antonio
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 238
  • [44] Comparative analysis on cross-modal information retrieval: A review
    Kaur, Parminder
    Pannu, Husanbir Singh
    Malhi, Avleen Kaur
    COMPUTER SCIENCE REVIEW, 2021, 39
  • [45] Exploring and Distilling Cross-Modal Information for Image Captioning
    Liu, Fenglin
    Ren, Xuancheng
    Liu, Yuanxin
    Lei, Kai
    Sun, Xu
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5095 - 5101
  • [46] Evaluating an Interface for Cross-Modal Collaborative Information Seeking
    AL-Thani, Dena
    Stockman, Tony
    INTERACTING WITH COMPUTERS, 2018, 30 (05) : 396 - 416
  • [47] Cross-modal impacts of anthropogenic noise on information use
    Morris-Drake, Amy
    Kern, Julie M.
    Radford, Andrew N.
    CURRENT BIOLOGY, 2016, 26 (20) : R911 - R912
  • [48] Deep Mutual Information Maximin for Cross-Modal Clustering
    Mao, Yiqiao
    Yan, Xiaoqiang
    Guo, Qiang
    Ye, Yangdong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8893 - 8901
  • [49] Cross-modal information fusion for voice spoofing detection
    Xue, Junxiao
    Zhou, Hao
    Song, Huawei
    Wu, Bin
    Shi, Lei
    SPEECH COMMUNICATION, 2023, 147 : 41 - 50
  • [50] Cross-modal information flows in highly automated vehicles
    Savchenko, V. V.
    Poddubko, S. N.
    INTERNATIONAL AUTOMOBILE SCIENTIFIC FORUM (IASF-2018), INTELLIGENT TRANSPORT SYSTEM TECHNOLOGIES AND COMPONENTS, 2019, 534