Multimodal Emotion Recognition with Auxiliary Sentiment Information

被引：0

作者：

Wu L. ^{[1
]}

Liu Q. ^{[1
]}

Zhang D. ^{[1
]}

Wang J. ^{[1
]}

Li S. ^{[1
]}

Zhou G. ^{[1
]}

机构：

[1] School of Computer Science & Technology, Soochow University, Suzhou

来源：

Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis | 2020年 / 56卷 / 01期

关键词：

Emotion recognition; Joint learning; Multimodal; Sentiment analysis;

D O I：

10.13209/j.0479-8023.2019.105

中图分类号：

学科分类号：

摘要：

Different from the previous studies with only text, this paper focuses on multimodal data (text and audio) to perform emotion recognition. To simultaneously address the characteristics of multimodal data, we propose a novel joint learning framework, which allows auxiliary task (multimodal sentiment classification) to help the main task (multimodal emotion classification). Specifically, private neural layers are designed for text and audio modalities from the main task to learn the uni-modal independent dynamics. Secondly, with the shared neural layers from auxiliary task, we obtain the uni-modal representations of the auxiliary task and the auxiliary representations of the main task. The uni-modal independent dynamics is combined with the auxiliary representa-tions for each modality to acquire the uni-modal representations of the main task. Finally, in order to capture multimodal interactive dynamics, we fuse the text and audio modalities’ representations for the main and auxiliary tasks separately to obtain the final multimodal emotion and sentiment representations with the self attention mechanism. Empirical results demonstrate the effectiveness of our approach to multimodal emotion classification task as well as the sentiment classification task. © 2020 Peking University.

引用

页码：75 / 81

页数：6

共 21 条

[1]

Morency L.P., Mihalcea R., Doshi P., Towards multi-modal sentiment analysis: harvesting opinions from the web, Proceedings of the 13th International Conference on Multimodal Interfaces, pp. 169-176, (2011)

[2]

Zadeh A., Zellers R., Pincus E., Et al., Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages, IEEE Intelligent Systems, 31, 6, pp. 82-88, (2016)

[3]

Gao W., Li S., Lee S.Y.M., Et al., Joint learning on sentiment and emotion classification, Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 1505-1508, (2013)

[4]

Wang R., Li S., Zhou G., Et al., Joint sentiment and emotion classification with integer linear programming, Proceedings of International Conference on Database Systems for Advanced Applications, pp. 259-265, (2015)

[5]

Zadeh A., Chen M., Poria S., Et al., Tensor fusion network for multimodal sentiment analysis, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1103-1114, (2017)

[6]

Ma D., Li S., Wang H., Joint learning for targeted sentiment analysis, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4737-4742, (2018)

[7]

Chen Y., Hou W., Cheng X., Et al., Joint learning for emotion classification and emotion cause detection, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 646-651, (2018)

[8]

Poria S., Cambria E., Hazarika D., Et al., Context-dependent sentiment analysis in user-generated videos, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Long Papers), 1, pp. 873-883, (2017)

[9]

Perez-Rosas V., Mihalcea R., Morency L.P., Utterance-level multimodal sentiment analysis, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Long Papers), 1, pp. 973-982, (2013)

[10]

Zhang D., Wu L., Li S., Et al., Multi-modal language analysis with hierarchical interaction-level and selection-level attention, Proceedings of ICME 2019, pp. 724-729, (2019)

← 1 2 3 →