A Discriminant Information Theoretic Learning Framework for Multi-modal Feature Representation

被引：2

作者：

Gao, Lei ^{[1
]}

Guan, Ling ^{[1
]}

机构：

[1] Ryerson Univ, 350 Victoria St, Toronto, ON M5B 2K3, Canada

来源：

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY | 2023年 / 14卷 / 03期

关键词：

Discriminative representation; complementary representation; information theoretic learning; multi-modal feature representation; image recognition; audio-visual emotion recognition; CANONICAL CORRELATION-ANALYSIS; GRAPH CONVOLUTIONAL NETWORKS; FEATURE FUSION; LEVEL FUSION; EMOTION; RECOGNITION; MODEL; SPARSE; AUTOENCODERS; ALGORITHMS;

D O I：

10.1145/3587253

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As sensory and computing technology advances, multi-modal features have been playing a central role in ubiquitously representing patterns and phenomena for effective information analysis and recognition. As a result, multi-modal feature representation is becoming a progressively significant direction of academic research and real applications. Nevertheless, numerous challenges remain ahead, especially in the joint utilization of discriminatory representations and complementary representations from multi-modal features. In this article, a discriminant information theoretic learning (DITL) framework is proposed to address these challenges. By employing this proposed framework, the discrimination and complementation within the given multi-modal features are exploited jointly, resulting in a high-quality feature representation. According to characteristics of the DITL framework, the newly generated feature representation is further optimized, leading to lower computational complexity and improved system performance. To demonstrate the effectiveness and generality of DITL, we conducted experiments on several recognition examples, including both static cases, such as handwritten digit recognition, face recognition, and object recognition, and dynamic cases, such as video-based human emotion recognition and action recognition. The results show that the proposed framework outperforms state-of-the-art algorithms.

引用

页数：24

共 50 条

[21] Introduction to a Framework for Multi-modal and Tangible Interaction
Lo, Kenneth W. K.
Tang, Will W. W.
Ngai, Grace
Chan, Stephen C. F.
Tse, Jason T. P.
IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,
[22] Neighbor-consistent multi-modal canonical correlations for feature fusion
Zhu, Yanmin
Peng, Tianhao
Su, Shuzhi
Li, Changpeng
INFRARED PHYSICS & TECHNOLOGY, 2022, 123
[23] Dynamically engineered multi-modal feature learning for predictions of office building cooling loads
Liu, Yiren
Zhao, Xiangyu
Qin, S. Joe
APPLIED ENERGY, 2024, 355
[24] Fusional Recognition for Depressive Tendency With Multi-Modal Feature
Wang, Hong
Zhou, Ying
Yu, Fengping
Zhao, Lili
Wang, Caiyu
Ren, Yanju
IEEE ACCESS, 2019, 7 : 38702 - 38713
[25] Adaptive Feature Fusion for Multi-modal Entity Alignment
Guo H.
Li X.-Y.
Tang J.-Y.
Guo Y.-M.
Zhao X.
Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (04): : 758 - 770
[26] Landmark Classification With Hierarchical Multi-Modal Exemplar Feature
Zhu, Lei
Shen, Jialie
Jin, Hai
Xie, Liang
Zheng, Ran
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (07) : 981 - 993
[27] An Optimal Edge-weighted Graph Semantic Correlation Framework for Multi-view Feature Representation Learning
Gao, Lei
Guo, Zheng
Guan, Ling
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (07) : 1 - 23
[28] Fast Multi-Task SCCA Learning with Feature Selection for Multi-Modal Brain Imaging Genetics
Du, Lei
Liu, Kefei
Yao, Xiaohui
Risacher, Shannon L.
Han, Junwei
Guo, Lei
Saykin, Andrew J.
Shen, Li
Weiner, Michael
Aisen, Paul
Petersen, Ronald
Jack, Clifford R., Jr.
Jagust, William
Trojanowki, John Q.
Toga, Arthur W.
Beckett, Laurel
Green, Robert C.
Saykin, Andrew J.
Morris, John
Liu, Enchi
Montine, Tom
Gamst, Anthony
Thomas, Ronald G.
Donohue, Michael
Walter, Sarah
Gessert, Devon
Sather, Tamie
Harvey, Danielle
Kornak, John
Dale, Anders
Bernstein, Matthew
Felmlee, Joel
Fox, Nick
Thompson, Paul
Schuff, Norbert
Alexander, Gene
DeCarli, Charles
Bandy, Dan
Koeppe, Robert A.
Foster, Norm
Reiman, Eric M.
Chen, Kewei
Mathis, Chet
Cairns, Nigel J.
Taylor-Reinwald, Lisa
Shaw, Les
Lee, Virginia M. Y.
Korecka, Magdalena
Crawford, Karen
Neu, Scott
PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 356 - 361
[29] Robust indoor localization based on multi-modal information fusion and multi-scale sequential feature extraction
Wang, Qinghu
Jia, Jie
Chen, Jian
Deng, Yansha
Wang, Xingwei
Aghvami, Abdol Hamid
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 155 : 164 - 178
[30] Based on Multi-Feature Information Attention Fusion for Multi-Modal Remote Sensing Image Semantic Segmentation
Zhang, Chongyu
2021 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (IEEE ICMA 2021), 2021, : 71 - 76

← 1 2 3 4 5 →