Supervised cross-modal factor analysis for multiple modal data classification

被引:13
|
作者
Wang, Jingbin [1 ,2 ]
Zhou, Yihua [3 ]
Duan, Kanghong [4 ]
Wang, Jim Jing-Yan [5 ]
Bensmail, Halima [6 ]
机构
[1] Chinese Acad Sci, Natl Time Serv Ctr, Xian 710600, Peoples R China
[2] Chinese Acad Sci, Grad Univ, Beijing 100039, Peoples R China
[3] Lehigh Univ, Dept Mech Engn & Mech, Bethlehem, PA 18015 USA
[4] State Ocean Adm, North China Sea Marine Tech Support Ctr, Qingdao 266033, Peoples R China
[5] King Abdullah Univ Sci & Technol, Comp Elect & Math Sci & Engn Div, Thuwal 23955, Saudi Arabia
[6] Qatar Comp Res Inst, Doha 5825, Qatar
来源
2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS | 2015年
关键词
Multiple modal learning; Cross-modal factor analysis; Supervised learning; SPARSE REPRESENTATION; TEXT CLASSIFICATION; SURFACE; ACTIVATION; NETWORK;
D O I
10.1109/SMC.2015.329
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we study the problem of learning from multiple modal data for purpose of document classification. In this problem, each document is composed two different modals of data, i.e., an image and a text. Cross-modal factor analysis (CFA) has been proposed to project the two different modals of data to a shared data space, so that the classification of a image or a text can be performed directly in this space. A disadvantage of CFA is that it has ignored the supervision information. In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor in the shared space to use the class label information. The factor analysis parameter and the predictor parameter are learned jointly by solving one single objective function. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projection measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple modal document data sets show the advantage of the proposed algorithm over other CFA methods.
引用
收藏
页码:1882 / 1888
页数:7
相关论文
共 50 条
  • [21] Neuroscience of synesthesia and cross-modal associations
    Neckar, Marcel
    Bob, Petr
    REVIEWS IN THE NEUROSCIENCES, 2014, 25 (06) : 833 - 840
  • [22] UCSL: Toward Unsupervised Common Subspace Learning for Cross-Modal Image Classification
    Yao, Jing
    Hong, Danfeng
    Wang, Haipeng
    Liu, Hao
    Chanussot, Jocelyn
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [23] Online deep hashing for both uni-modal and cross-modal retrieval
    Xie, Yicai
    Zeng, Xianhua
    Wang, Tinghua
    Yi, Yun
    INFORMATION SCIENCES, 2022, 608 : 1480 - 1502
  • [24] Intra-class low-rank regularization for supervised and semi-supervised cross-modal retrieval
    Peipei Kang
    Zehang Lin
    Zhenguo Yang
    Xiaozhao Fang
    Alexander M. Bronstein
    Qing Li
    Wenyin Liu
    Applied Intelligence, 2022, 52 : 33 - 54
  • [25] Intra-class low-rank regularization for supervised and semi-supervised cross-modal retrieval
    Kang, Peipei
    Lin, Zehang
    Yang, Zhenguo
    Fang, Xiaozhao
    Bronstein, Alexander M.
    Li, Qing
    Liu, Wenyin
    APPLIED INTELLIGENCE, 2022, 52 (01) : 33 - 54
  • [26] Cross-modal cueing in audiovisual spatial attention
    Blurton, Steven P.
    Greenlee, Mark W.
    Gondan, Matthias
    ATTENTION PERCEPTION & PSYCHOPHYSICS, 2015, 77 (07) : 2356 - 2376
  • [27] Toward Generic Cross-Modal Transmission Strategy
    Wei, Xin
    Liao, Junqi
    Zhou, Liang
    Sari, Hikmet
    Zhuang, Weihua
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2024, 72 (10) : 6059 - 6072
  • [28] Object Segmentation by Mining Cross-Modal Semantics
    Wu, Zongwei
    Wang, Jingjing
    Zhou, Zhuyun
    An, Zhaochong
    Jiang, Qiuping
    Demonceaux, Cedric
    Sun, Guolei
    Timofte, Radu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3455 - 3464
  • [29] Deep Relation Embedding for Cross-Modal Retrieval
    Zhang, Yifan
    Zhou, Wengang
    Wang, Min
    Tian, Qi
    Li, Houqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 617 - 627
  • [30] Haptic Signal Reconstruction for Cross-Modal Communications
    Wei, Xin
    Shi, Yingying
    Zhou, Liang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 4514 - 4525