Tri-factorized Modular Hypergraph Autoencoder for Multimodal Semantic Analysis

被引：0

作者：

Shaily Malik ^{[1
]}

Geetika Dhand ^{[1
]}

Kavita Sheoran ^{[1
]}

Divya Jatain ^{[1
]}

Vaani Garg ^{[1
]}

机构：

[1] Maharaja Surajmal Institute of Technology,Department of Computer Science and Engineering

来源：

SN Computer Science | / 5卷 / 7期

关键词：

TriNMF; Multimodal retrieval; Multimodality; Wiki dataset; Semantic analysis; Nonnegative matrix factorization;

D O I：

10.1007/s42979-024-03210-8

中图分类号：

学科分类号：

摘要：

For image-to-text and text-to-image classifications, the features of data collected from various imaging devices, sensors, and their text descriptions must be mapped into a common latent space with reduced dimensions. The low-dimensional features are supposed to provide the most information with the least amount of loss.In this paper we propose a cross-modal semantic autoencoder that uses nonnegative matrix factorization (NMF) to factorize the features into a lower rank. Due to two matrix factorization, the traditional NMF is unable to translate all of the information into lower space. This is addressed by a unique tri-factorized NMF with hypergraph regularization. Instead of using the feature adjacency matrix in hypergraph regularization, a more information-rich modularity matrix is suggested. The Wiki dataset is used to evaluate this tri-factorized hypergraph regularized multimodal autoencoder for image-to-text and text-to-image conversion. In order to lower the feature dimension, Multimodal Conditional Principal label space transformation (MCPLST) is also enabled by this novel autoencoder. Comparing the proposed autoencoder against the semantic autoencoder, the former showed an improvement in classification accuracy of up to 1.8%.

引用

共 50 条

[1] Multimodal semantic analysis with regularized semantic autoencoder
Malik, Shaily
Bansal, Poonam
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (02) : 909 - 917
[2] Hypergraph Variational Autoencoder for Multimodal Semi-supervised Representation Learning
Liu, Jingquan
Du, Xiaoyong
Li, Yuanzhe
Hu, Weidong
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 395 - 406
[3] SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Yu, Lijun
Cheng, Yong
Wang, Zhiruo
Kumar, Vivek
Macherey, Wolfgang
Huang, Yanping
Ross, David A.
Essa, Irfan
Bisk, Yonatan
Yang, Ming-Hsuan
Murphy, Kevin
Hauptmann, Alexander G.
Jiang, Lu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[4] Deep supervised multimodal semantic autoencoder for cross-modal retrieval
Tian, Yu
Yang, Wenjing
Liu, Qingsong
Yang, Qiong
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2020, 31 (4-5)
[5] Multimodal hypergraph network with contrastive learning for sentiment analysis
Huang, Jian
Jiang, Kun
Pu, Yuanyuan
Zhao, Zhengpeng
Yang, Qiuxia
Gu, Jinjing
Xu, Dan
NEUROCOMPUTING, 2025, 627
[6] Dynamic hypergraph convolutional network for multimodal sentiment analysis
Huang, Jian
Pu, Yuanyuan
Zhou, Dongming
Cao, Jinde
Gu, Jinjing
Zhao, Zhengpeng
Xu, Dan
NEUROCOMPUTING, 2024, 565
[7] MMP-MSH: Multimodal Mortality Prediction Based on a Multilevel Semantic Hypergraph Network
Niu, Ke
Zhang, Ke
Pan, Yijie
Tai, Wenjuan
Cai, Jiuyun
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
[8] DHHNN: A Dynamic Hypergraph Hyperbolic Neural Network based on variational autoencoder for multimodal data integration and node classification
Mei, Zhangyu
Bi, Xiao
Li, Dianguo
Xia, Wen
Yang, Fan
Wu, Hao
INFORMATION FUSION, 2025, 119
[9] CHAMPS: Cardiac health Hypergraph Analysis using Multimodal Physiological Signals
Choudhury, Anirban Dutta
Chowdhury, Ananda S.
2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2019, : 4640 - 4645
[10] Directed Hypergraph-Based STEP Product Semantic Visual Analysis
Jian, Chengfeng
Pang, Chunxia
Tao, Meng
JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2014, 17 (04): : 363 - 378

← 1 2 3 4 5 →