CroMIC-QA: The Cross-Modal Information Complementation Based Question Answering

被引:0
|
作者
Qian, Shun [1 ]
Liu, Bingquan [1 ]
Sun, Chengjie [1 ]
Xu, Zhen [1 ]
Ma, Lin [2 ]
Wang, Baoxun [3 ]
机构
[1] Harbin Inst Technol, Fac Comp, Harbin 150001, Peoples R China
[2] Meituan Inc, Beijing 100091, Peoples R China
[3] Tencent Co Ltd, Beijing 100091, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Visualization; Semantics; Crops; Question answering (information retrieval); Diseases; Linguistics; Cross-modal semantic interaction; visual question answering; domain-specific datasets; multi-modal tasks;
D O I
10.1109/TMM.2023.3326616
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new multi-modal question -answering task, named as Cross-Modal Information Complementation based Question Answering (CroMIC-QA), to promote the exploration on bridging the semantic gap between visual and linguistic signals. The proposed task is inspired by the common phenomenon that, in most user-generated QA scenarios, the information of the given textual question is incomplete, and thus it is required to merge the semantics of both the text and the accompanying image to infer the complete real question. In this work, the CroMIC-QA task is first formally defined and compared with the classic Visual Question Answering (VQA) task. On this basis, a specified dataset, CroMIC-QA-Agri, is collected from an online QA community in the agriculture domain for the proposed task. A group of experiments is conducted on this dataset, with the typical multi-modal deep architectures implemented and compared. The experimental results show that the appropriate text/image presentations and text-image semantic interaction methods are effective to improve the performance of the framework.
引用
收藏
页码:8348 / 8359
页数:12
相关论文
共 50 条
  • [21] QA4IE: A Question Answering Based Framework for Information Extraction
    Qiu, Lin
    Zhou, Hao
    Qu, Yanru
    Zhang, Weinan
    Li, Suoheng
    Rong, Shu
    Ru, Dongyu
    Qian, Lihua
    Tu, Kewei
    Yu, Yong
    SEMANTIC WEB - ISWC 2018, PT I, 2018, 11136 : 198 - 216
  • [22] Unified Transformer with Cross-Modal Mixture Experts for Remote-Sensing Visual Question Answering
    Liu, Gang
    He, Jinlong
    Li, Pengfei
    Zhong, Shenjun
    Li, Hongyang
    He, Genrong
    REMOTE SENSING, 2023, 15 (19)
  • [23] Asymmetric cross-modal attention network with multimodal augmented mixup for medical visual question answering
    Li, Yong
    Yang, Qihao
    Wang, Fu Lee
    Lee, Lap-Kei
    Qu, Yingying
    Hao, Tianyong
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 144
  • [24] INFORMATION COMPLEXITY AND CROSS-MODAL FUNCTIONS
    FREIDES, D
    BRITISH JOURNAL OF PSYCHOLOGY, 1975, 66 (AUG) : 283 - 287
  • [25] Parameterization before Meta-Analysis: Cross-Modal Embedding Clustering for Forest Ecology Question-Answering
    Tao, Rui
    Zhu, Meng
    Cao, Haiyan
    Ren, Hong-E
    FORESTS, 2024, 15 (09):
  • [26] Cross-Modal Localization Through Mutual Information
    Alempijevic, Alen
    Kodagoda, Sarath
    Dissanayake, Gamini
    2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 5597 - 5602
  • [27] Cross-modal information integration in category learning
    Smith, J. David
    Johnston, Jennifer J. R.
    Musgrave, Robert D.
    Zakrzewski, Alexandria C.
    Boomer, Joseph
    Church, Barbara A.
    Ashby, F. Gregory
    ATTENTION PERCEPTION & PSYCHOPHYSICS, 2014, 76 (05) : 1473 - 1484
  • [28] Mechanism of Cross-modal Information Influencing Taste
    Liang, Pei
    Jiang, Jia-yu
    Liu, Qiang
    Zhang, Su-lin
    Yang, Hua-jing
    CURRENT MEDICAL SCIENCE, 2020, 40 (03) : 474 - 479
  • [29] Mechanism of Cross-modal Information Influencing Taste
    Pei Liang
    Jia-yu Jiang
    Qiang Liu
    Su-lin Zhang
    Hua-jing Yang
    Current Medical Science, 2020, 40 : 474 - 479
  • [30] Information Recovery Technology for Cross-Modal Communications
    Xu J.-B.
    Wei X.
    Zhou L.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (07): : 1631 - 1642