A Discriminant Information Theoretic Learning Framework for Multi-modal Feature Representation

被引：2

作者：

Gao, Lei ^{[1
]}

Guan, Ling ^{[1
]}

机构：

[1] Ryerson Univ, 350 Victoria St, Toronto, ON M5B 2K3, Canada

来源：

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY | 2023年 / 14卷 / 03期

关键词：

Discriminative representation; complementary representation; information theoretic learning; multi-modal feature representation; image recognition; audio-visual emotion recognition; CANONICAL CORRELATION-ANALYSIS; GRAPH CONVOLUTIONAL NETWORKS; FEATURE FUSION; LEVEL FUSION; EMOTION; RECOGNITION; MODEL; SPARSE; AUTOENCODERS; ALGORITHMS;

D O I：

10.1145/3587253

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As sensory and computing technology advances, multi-modal features have been playing a central role in ubiquitously representing patterns and phenomena for effective information analysis and recognition. As a result, multi-modal feature representation is becoming a progressively significant direction of academic research and real applications. Nevertheless, numerous challenges remain ahead, especially in the joint utilization of discriminatory representations and complementary representations from multi-modal features. In this article, a discriminant information theoretic learning (DITL) framework is proposed to address these challenges. By employing this proposed framework, the discrimination and complementation within the given multi-modal features are exploited jointly, resulting in a high-quality feature representation. According to characteristics of the DITL framework, the newly generated feature representation is further optimized, leading to lower computational complexity and improved system performance. To demonstrate the effectiveness and generality of DITL, we conducted experiments on several recognition examples, including both static cases, such as handwritten digit recognition, face recognition, and object recognition, and dynamic cases, such as video-based human emotion recognition and action recognition. The results show that the proposed framework outperforms state-of-the-art algorithms.

引用

页数：24

共 50 条

[1] A Discriminative Vectorial Framework for Multi-Modal Feature Representation
Gao, Lei
Guan, Ling
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1503 - 1514
[2] Exponential Multi-Modal Discriminant Feature Fusion for Small Sample Size
Zhu, Yanmin
Peng, Tianhao
Su, Shuzhi
IEEE ACCESS, 2022, 10 : 14507 - 14517
[3] CLMTR: a generic framework for contrastive multi-modal trajectory representation learning
Liang, Anqi
Yao, Bin
Xie, Jiong
Zheng, Wenli
Shen, Yanyan
Ge, Qiqi
GEOINFORMATICA, 2024, : 233 - 253
[4] The integration of information in a digital, multi-modal learning environment
Schueler, Anne
LEARNING AND INSTRUCTION, 2019, 59 : 76 - 87
[5] Efficient disentangled representation learning for multi-modal finger biometrics
Yang, Weili
Huang, Junduan
Luo, Dacan
Kang, Wenxiong
PATTERN RECOGNITION, 2024, 145
[6] MFF: Multi-modal feature fusion for zero-shot learning
Cao, Weipeng
Wu, Yuhao
Huang, Chengchao
Patwary, Muhammed J. A.
Wang, Xizhao
NEUROCOMPUTING, 2022, 510 : 172 - 180
[7] A Conversational Agent Framework with Multi-modal Personality Expression
Sonlu, Sinan
Gudukbay, Ugur
Durupinar, Funda
ACM TRANSACTIONS ON GRAPHICS, 2021, 40 (01):
[8] MDNNSyn: A Multi-Modal Deep Learning Framework for Drug Synergy Prediction
Li, Lei
Li, Haitao
Ishdorj, Tseren-Onolt
Zheng, Chunhou
Su, Yansen
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (10) : 6225 - 6236
[9] A Multi-modal Metric Learning Framework for Time Series kNN Classification
Cao-Tri Do
Douzal-Chouakria, Ahlame
Marie, Sylvain
Rombaut, Michele
ADVANCED ANALYSIS AND LEARNING ON TEMPORAL DATA, AALTD 2015, 2016, 9785 : 131 - 143
[10] A Multi-Biometric Feature-Fusion Framework for Improved Uni-Modal and Multi-Modal Human Identification
Brown, Dane
Bradshaw, Karen
2016 IEEE SYMPOSIUM ON TECHNOLOGIES FOR HOMELAND SECURITY (HST), 2016,

← 1 2 3 4 5 →