Supervised cross-modal factor analysis for multiple modal data classification

被引:13
作者
Wang, Jingbin [1 ,2 ]
Zhou, Yihua [3 ]
Duan, Kanghong [4 ]
Wang, Jim Jing-Yan [5 ]
Bensmail, Halima [6 ]
机构
[1] Chinese Acad Sci, Natl Time Serv Ctr, Xian 710600, Peoples R China
[2] Chinese Acad Sci, Grad Univ, Beijing 100039, Peoples R China
[3] Lehigh Univ, Dept Mech Engn & Mech, Bethlehem, PA 18015 USA
[4] State Ocean Adm, North China Sea Marine Tech Support Ctr, Qingdao 266033, Peoples R China
[5] King Abdullah Univ Sci & Technol, Comp Elect & Math Sci & Engn Div, Thuwal 23955, Saudi Arabia
[6] Qatar Comp Res Inst, Doha 5825, Qatar
来源
2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS | 2015年
关键词
Multiple modal learning; Cross-modal factor analysis; Supervised learning; SPARSE REPRESENTATION; TEXT CLASSIFICATION; SURFACE; ACTIVATION; NETWORK;
D O I
10.1109/SMC.2015.329
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we study the problem of learning from multiple modal data for purpose of document classification. In this problem, each document is composed two different modals of data, i.e., an image and a text. Cross-modal factor analysis (CFA) has been proposed to project the two different modals of data to a shared data space, so that the classification of a image or a text can be performed directly in this space. A disadvantage of CFA is that it has ignored the supervision information. In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor in the shared space to use the class label information. The factor analysis parameter and the predictor parameter are learned jointly by solving one single objective function. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projection measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple modal document data sets show the advantage of the proposed algorithm over other CFA methods.
引用
收藏
页码:1882 / 1888
页数:7
相关论文
共 50 条
  • [31] MSSPQ: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval
    Zhu, Lei
    Cai, Liewu
    Song, Jiayu
    Zhu, Xinghui
    Zhang, Chengyuan
    Zhang, Shichao
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 631 - 638
  • [32] SCQ: Self-Supervised Cross-Modal Quantization for Unsupervised Large-Scale Retrieval
    Nakamura, Fuga
    Harakawa, Ryosuke
    Iwahashi, Masahiro
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1337 - 1342
  • [33] Multi-label enhancement based self-supervised deep cross-modal hashing
    Zou, Xitao
    Wu, Song
    Bakker, Erwin M.
    Wang, Xinzhi
    NEUROCOMPUTING, 2022, 467 : 138 - 162
  • [34] Cross-modal contrastive learning for unified placenta analysis using photographs
    Pan, Yimu
    Mehta, Manas
    Goldstein, Jeffery A.
    Ngonzi, Joseph
    Bebell, Lisa M.
    Roberts, Drucilla J.
    Carreon, Chrystalle Katte
    Gallagher, Kelly
    Walker, Rachel E.
    Gernand, Alison D.
    Wang, James Z.
    PATTERNS, 2024, 5 (12):
  • [35] A Classification Method for the Cellular Images Based on Active Learning and Cross-Modal Transfer Learning
    Vununu, Caleb
    Lee, Suk-Hwan
    Kwon, Ki-Ryong
    SENSORS, 2021, 21 (04) : 1 - 24
  • [36] Cross-modal and Cross-medium Adversarial Attack for Audio
    Zhang, Liguo
    Tian, Zilin
    Long, Yunfei
    Li, Sizhao
    Yin, Guisheng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 444 - 453
  • [37] Semantics-Reconstructing Hashing for Cross-Modal Retrieval
    Zhang, Peng-Fei
    Huang, Zi
    Zhang, Zheng
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 315 - 327
  • [38] Deep fused two-step cross-modal hashing with multiple semantic supervision
    Peipei Kang
    Zehang Lin
    Zhenguo Yang
    Alexander M. Bronstein
    Qing Li
    Wenyin Liu
    Multimedia Tools and Applications, 2022, 81 : 15653 - 15670
  • [39] Deep fused two-step cross-modal hashing with multiple semantic supervision
    Kang, Peipei
    Lin, Zehang
    Yang, Zhenguo
    Bronstein, Alexander M.
    Li, Qing
    Liu, Wenyin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (11) : 15653 - 15670
  • [40] Cross-Modal Impact of Recent Word Encountering Experience
    Liu, Jiayu
    Gu, Junjuan
    Feng, Chen
    Shi, Weiting
    Biemann, Chris
    Li, Xingshan
    SCIENTIFIC STUDIES OF READING, 2024, 28 (02) : 101 - 119