Label distribution for multimodal machine learning

被引:15
作者
Ren, Yi [1 ]
Xu, Ning [1 ]
Ling, Miaogen [1 ]
Geng, Xin [1 ]
机构
[1] Southeast Univ, Dept Comp Sci & Engn, Nanjing 211189, Peoples R China
基金
中国国家自然科学基金;
关键词
multimodal machine learning; label distribution learning; sentiment analysis; disease prediction; FUSION;
D O I
10.1007/s11704-021-0611-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal machine learning (MML) aims to understand the world from multiple related modalities. It has attracted much attention as multimodal data has become increasingly available in real-world application. It is shown that MML can perform better than single-modal machine learning, since multi-modalities containing more information which could complement each other. However, it is a key challenge to fuse the multi-modalities in MML. Different from previous work, we further consider the side-information, which reflects the situation and influences the fusion of multi-modalities. We recover multimodal label distribution (MLD) by leveraging the side-information, representing the degree to which each modality contributes to describing the instance. Accordingly, a novel framework named multimodal label distribution learning (MLDL) is proposed to recover the MLD, and fuse the multimodalities with its guidance to learn an in-depth understanding of the jointly feature representation. Moreover, two versions of MLDL are proposed to deal with the sequential data. Experiments on multimodal sentiment analysis and disease prediction show that the proposed approaches perform favorably against state-of-the-art methods.
引用
收藏
页数:11
相关论文
共 46 条
[1]  
[Anonymous], 2012, P INT C NEUR INF PRO
[2]   Multimodal Machine Learning: A Survey and Taxonomy [J].
Baltrusaitis, Tadas ;
Ahuja, Chaitanya ;
Morency, Louis-Philippe .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443
[3]  
Chen S, 2020, P IEEE CVF C COMP VI, p13 984
[4]   GRAM: Graph-based Attention Model for Healthcare Representation Learning [J].
Choi, Edward ;
Bahadori, Mohammad Taha ;
Song, Le ;
Stewart, Walter F. ;
Sun, Jimeng .
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, :787-795
[5]  
Choi Edward, 2016, JMLR Workshop Conf Proc, V56, P301
[6]  
Degottex G, 2014, INT CONF ACOUST SPEE, DOI 10.1109/ICASSP.2014.6853739
[7]   Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention [J].
Evangelopoulos, Georgios ;
Zlatintsi, Athanasia ;
Potamianos, Alexandros ;
Maragos, Petros ;
Rapantzikos, Konstantinos ;
Skoumas, Georgios ;
Avrithis, Yannis .
IEEE TRANSACTIONS ON MULTIMEDIA, 2013, 15 (07) :1553-1568
[8]   Label Distribution Learning [J].
Geng, Xin .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (07) :1734-1748
[9]   Head Pose Estimation Based on Multivariate Label Distribution [J].
Geng, Xin ;
Xia, Yu .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1837-1842
[10]   Facial Age Estimation by Learning from Label Distributions [J].
Geng, Xin ;
Yin, Chao ;
Zhou, Zhi-Hua .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (10) :2401-2412