MEAS: Multimodal Emotion Analysis System for Short Videos on Social Media Platforms

被引:1
作者
Wei, Qinglan [1 ,2 ,3 ]
Zhou, Yaqi [1 ,2 ,3 ]
Xiang, Shenlian [1 ,2 ,3 ]
Xiao, Longhui [1 ,2 ,3 ]
Zhang, Yuan [1 ,2 ,3 ]
机构
[1] Commun Univ China, Sch Data Sci & Intelligent Media, Beijing 100024, Peoples R China
[2] Commun Univ China, Sch Informat & Commun Engn, Beijing 100024, Peoples R China
[3] Commun Univ China, State Key Lab Media Convergence & Commun, Beijing 100024, Peoples R China
基金
中国国家自然科学基金;
关键词
Videos; Annotations; Analytical models; Accuracy; Computational modeling; Social networking (online); Feature extraction; Speech recognition; Manuals; Affective computing; Emotion analysis; multimodal data; short videos; social media; FUSION; TRANSFORMER; RECOGNITION;
D O I
10.1109/TCSS.2024.3490846
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Short videos have surged in popularity on social media. The emotions expressed by short videos can trigger or even magnify the public sentiment. Hence, accurate computation of these emotions is vital for social affective computing. However, the multimodal emotion analysis of short videos on social platforms faces some challenges: the accuracy of the model is tested by the disunity of video resolution; the collection of large-scale social media data, the manual transcription and segmentation of audio content, and the precise labeling process require a lot of manpower. In this article, we have proposed an affective computing system MEAS for social short videos, which combines multiscale resolution adaptability and advanced RoBERTa model to optimize the preprocessing of high definition large size short videos and improve the contribution of text modality in emotion analysis. In addition, the system also adopts automatic audio segmentation and transcription technology to realize the efficient capture of speech forms in social short videos. Experimental results show that compared with the leading open source algorithm V2EM on the IEMOCAP dataset, the proposed method achieves a significant increase in weighted accuracy and F1 score of 4.17% and 7.29%, respectively. We constructed a novel dataset named "Bili-news" based on social platform news short videos, validating the effectiveness of the MEAS system. Through experimental verification, we also find a significant positive correlation between the emotions expressed in short videos and the social sentiments of the audience.
引用
收藏
页数:13
相关论文
共 48 条
[1]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[2]  
Chen X., 2016, P 24 ACM INT C MULT, DOI [10.1145/2964284.2964314, DOI 10.1145/2964284.2964314]
[3]  
Chung HW, 2022, Arxiv, DOI [arXiv:2210.11416, DOI 10.48550/ARXIV.2210.11416]
[4]  
Dai WL, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P5305
[5]  
Dai WL, 2020, 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), P269
[6]  
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[7]   RepVGG: Making VGG-style ConvNets Great Again [J].
Ding, Xiaohan ;
Zhang, Xiangyu ;
Ma, Ningning ;
Han, Jungong ;
Ding, Guiguang ;
Sun, Jian .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13728-13737
[8]   Ensemble application of ELM and GPU for real-time multimodal sentiment analysis [J].
Ha-Nguyen Tran ;
Cambria, Erik .
MEMETIC COMPUTING, 2018, 10 (01) :3-13
[9]  
Hasan MK, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P2046
[10]   MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding [J].
He, Bo ;
Lie, Hengduo ;
Jang, Young Kyun ;
Jia, Menglin ;
Cao, Xuefei ;
Shah, Ashish ;
Shrivastava, Abhinav ;
Lim, Ser-Nam .
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, :13504-13514