A Survey on Multi-modal Summarization

被引:23
作者
Jangra, Anubhav [1 ]
Mukherjee, Sourajit [2 ]
Jatowt, Adam [3 ,4 ]
Saha, Sriparna [1 ]
Hasanuzzaman, Mohammad [5 ]
机构
[1] Indian Inst Technol Patna, Dept Comp Sci, Patna 801106, Bihar, India
[2] Indian Inst Technol Patna, Dept Math, Patna, Bihar, India
[3] Univ Innsbruck, Dept Informat, Innsbruck, Austria
[4] Univ Innsbruck, DiSC, Innsbruck, Austria
[5] Cork Inst Technol, Dept Comp Sci, Cork, Ireland
关键词
Summarization; multi-modal content processing; neural networks; FUSION; VIDEO; LANGUAGE; SALIENCY; REVIEWS;
D O I
10.1145/3584700
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The new era of technology has brought us to the point where it is convenient for people to share their opinions over an abundance of platforms. These platforms have a provision for the users to express themselves in multiple forms of representations, including text, images, videos, and audio. This, however, makes it difficult for users to obtain all the key information about a topic, making the task of automatic multi-modal summarization (MMS) essential. In this article, we present a comprehensive survey of the existing research in the area of MMS, covering various modalities such as text, image, audio, and video. Apart from highlighting the different evaluation metrics and datasets used for the MMS task, our work also discusses the current challenges and future directions in this field.
引用
收藏
页数:36
相关论文
共 198 条
  • [1] Alguliev R., 2010, INTELLIGENT CONTROL, V1, P105
  • [2] Topic and sentiment aware microblog summarization for twitter
    Ali, Syed Muhammad
    Noorian, Zeinab
    Bagheri, Ebrahim
    Ding, Chen
    Al-Obeidat, Feras
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2020, 54 (01) : 129 - 156
  • [3] [Anonymous], 2013, P 2 INT WORKSHOP SOC
  • [4] [Anonymous], 2019, J CULTUR COGN SCI, P1
  • [5] [Anonymous], 2013, INT C MULT RETR ICMR, DOI DOI 10.1145/2461466.2461480
  • [6] Arshad Omer, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P337, DOI 10.1109/ICDAR.2019.00061
  • [7] Multimodal fusion for multimedia analysis: a survey
    Atrey, Pradeep K.
    Hossain, M. Anwar
    El Saddik, Abdulmotaleb
    Kankanhalli, Mohan S.
    [J]. MULTIMEDIA SYSTEMS, 2010, 16 (06) : 345 - 379
  • [8] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]
  • [9] Multimodal Machine Learning: A Survey and Taxonomy
    Baltrusaitis, Tadas
    Ahuja, Chaitanya
    Morency, Louis-Philippe
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) : 423 - 443
  • [10] Barbieri F, 2018, Arxiv, DOI arXiv:1803.02392