A Survey on Multi-modal Summarization

被引：23

作者：

Jangra, Anubhav ^{[1
]}

Mukherjee, Sourajit ^{[2
]}

Jatowt, Adam ^{[3
,4
]}

Saha, Sriparna ^{[1
]}

Hasanuzzaman, Mohammad ^{[5
]}

机构：

[1] Indian Inst Technol Patna, Dept Comp Sci, Patna 801106, Bihar, India

[2] Indian Inst Technol Patna, Dept Math, Patna, Bihar, India

[3] Univ Innsbruck, Dept Informat, Innsbruck, Austria

[4] Univ Innsbruck, DiSC, Innsbruck, Austria

[5] Cork Inst Technol, Dept Comp Sci, Cork, Ireland

来源：

ACM COMPUTING SURVEYS | 2023年 / 55卷 / 13S期

关键词：

Summarization; multi-modal content processing; neural networks; FUSION; VIDEO; LANGUAGE; SALIENCY; REVIEWS;

D O I：

10.1145/3584700

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The new era of technology has brought us to the point where it is convenient for people to share their opinions over an abundance of platforms. These platforms have a provision for the users to express themselves in multiple forms of representations, including text, images, videos, and audio. This, however, makes it difficult for users to obtain all the key information about a topic, making the task of automatic multi-modal summarization (MMS) essential. In this article, we present a comprehensive survey of the existing research in the area of MMS, covering various modalities such as text, image, audio, and video. Apart from highlighting the different evaluation metrics and datasets used for the MMS task, our work also discusses the current challenges and future directions in this field.

引用

页数：36

共 198 条

[1] Alguliev R., 2010, INTELLIGENT CONTROL, V1, P105
[2] Topic and sentiment aware microblog summarization for twitter
Ali, Syed Muhammad
Noorian, Zeinab
Bagheri, Ebrahim
Ding, Chen
Al-Obeidat, Feras
[J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2020, 54 (01) : 129 - 156
[3] [Anonymous], 2013, P 2 INT WORKSHOP SOC
[4] [Anonymous], 2019, J CULTUR COGN SCI, P1
[5] [Anonymous], 2013, INT C MULT RETR ICMR, DOI DOI 10.1145/2461466.2461480
[6] Arshad Omer, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P337, DOI 10.1109/ICDAR.2019.00061
[7] Multimodal fusion for multimedia analysis: a survey
Atrey, Pradeep K.
Hossain, M. Anwar
El Saddik, Abdulmotaleb
Kankanhalli, Mohan S.
[J]. MULTIMEDIA SYSTEMS, 2010, 16 (06) : 345 - 379
[8] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]
[9] Multimodal Machine Learning: A Survey and Taxonomy
Baltrusaitis, Tadas
Ahuja, Chaitanya
Morency, Louis-Philippe
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) : 423 - 443
[10] Barbieri F, 2018, Arxiv, DOI arXiv:1803.02392

← 1 2 3 4 5 6 7 8 9 10 →