PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using Large Language Models

被引：6

作者：

Prakash, Nirmalendu ^{[1
]}

Wang, Han ^{[1
]}

Hoang, Nguyen Khoi ^{[2
]}

Hee, Ming Shan ^{[1
]}

Lee, Roy Ka-Wei ^{[1
]}

机构：

[1] Singapore Univ Technol & Design, Singapore, Singapore

[2] VinUniversity, Hanoi, Vietnam

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

关键词：

meme; multimodal; topic modeling; large language models; INTERNET MEMES;

D O I：

10.1145/3581783.3613836

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The proliferation of social media has given rise to a new form of communication: memes. Memes are multimodal and often contain a combination of text and visual elements that convey meaning, humor, and cultural significance. While meme analysis has been an active area of research, little work has been done on unsupervised multimodal topic modeling of memes, which is important for content moderation, social media analysis, and cultural studies. We propose PromptMTopic, a novel multimodal prompt-based model designed to learn topics from both text and visual modalities by leveraging the language modeling capabilities of large language models. Our model effectively extracts and clusters topics learned from memes, considering the semantic interaction between the text and visual modalities. We evaluate our proposed model through extensive experiments on three real-world meme datasets, which demonstrate its superiority over state-of-the-art topic modeling baselines in learning descriptive topics in memes. Additionally, our qualitative analysis shows that PromptMTopic can identify meaningful and culturally relevant topics from memes. Our work contributes to the understanding of the topics and themes of memes, a crucial form of communication in today's society.

引用

页码：621 / 631

页数：11

共 50 条

[21] Visual cognition in multimodal large language models
Buschoff, Luca M. Schulze
Akata, Elif
Bethge, Matthias
Schulz, Eric
NATURE MACHINE INTELLIGENCE, 2025, 7 (01) : 96 - 106
[22] Multimodal large language models for bioimage analysis
Zhang, Shanghang
Dai, Gaole
Huang, Tiejun
Chen, Jianxu
NATURE METHODS, 2024, 21 (08) : 1390 - 1393
[23] Leveraging Multimodal Large Language Models for Enhanced Learning and Application in Building Energy Modeling
Labib, Rania
MULTIPHYSICS AND MULTISCALE BUILDING PHYSICS, IBPC 2024, VOL 3, 2025, 554 : 611 - 618
[24] Unsupervised Topic Modeling in a Large Free Text Radiology Report Repository
Saeed Hassanpour
Curtis P. Langlotz
Journal of Digital Imaging, 2016, 29 : 59 - 62
[25] Unsupervised Topic Modeling in a Large Free Text Radiology Report Repository
Hassanpour, Saeed
Langlotz, Curtis P.
JOURNAL OF DIGITAL IMAGING, 2016, 29 (01) : 59 - 62
[26] Using Language Models and Topic Models for XML Retrieval
Huang, Fang
FOCUSED ACCESS TO XML DOCUMENTS, 2008, 4862 : 94 - 102
[27] Evolution and Prospects of Foundation Models: From Large Language Models to Large Multimodal Models
Chen, Zheyi
Xu, Liuchang
Zheng, Hongting
Chen, Luyao
Tolba, Amr
Zhao, Liang
Yu, Keping
Feng, Hailin
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (02): : 1753 - 1808
[28] Large language models and multimodal foundation models for precision oncology
Truhn, Daniel
Eckardt, Jan-Niklas
Ferber, Dyke
Kather, Jakob Nikolas
NPJ PRECISION ONCOLOGY, 2024, 8 (01)
[29] Large language models and multimodal foundation models for precision oncology
Daniel Truhn
Jan-Niklas Eckardt
Dyke Ferber
Jakob Nikolas Kather
npj Precision Oncology, 8
[30] ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language Models
Yang, Jackie Junrui
Shi, Yingtian
Zhang, Yuhan
Li, Karina
Rosli, Daniel Wan
Jain, Anisha
Zhang, Shuning
Li, Tianshi
Landay, James A.
Lam, Monica S.
PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,

← 1 2 3 4 5 →