Two-Stream Convolutional Neural Network for Multimodal Matching

被引:2
|
作者
Zhang, Youcai [1 ]
Gu, Yiwei [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal matching; Two-stream network; Convolutional neural network;
D O I
10.1007/978-3-030-01418-6_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mulitimudal matching aims to establish relationship across different modalities such as image and text. Existing works mainly focus on maximizing the correlation between feature vectors extracted from the off-the-shelf models. The feature extraction and the matching are two-stage learning process. This paper presents a novel two-stream convolutional neural network that integrates the feature extraction and the matching under an end-to-end manner. Visual and textual stream are designed for feature extraction and then are concatenated with multiple shared layers for multimodal matching. The network is trained using an extreme multiclass classification loss by viewing each multimodal data as a class. Then a finetuning step is performed by a ranking constraint. Experimental results on Flickr30k datasets demonstrate the effectiveness of the proposed network for multimodal matching.
引用
收藏
页码:14 / 21
页数:8
相关论文
共 50 条
  • [1] Deep Convolutional Neural Network Based on Two-Stream Convolutional Unit
    Hou Congcong
    He Yuqing
    Jiang Xiaoheng
    Pan Jing
    LASER & OPTOELECTRONICS PROGRESS, 2018, 55 (02)
  • [2] Two-Stream Convolutional Neural Network for Video Action Recognition
    Qiao, Han
    Liu, Shuang
    Xu, Qingzhen
    Liu, Shouqiang
    Yang, Wanggan
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (10): : 3668 - 3684
  • [3] Two-stream Convolutional Neural Network for Image Source Social Network Identification
    Berthet, Alexandre
    Tescari, Francesco
    Galdi, Chiara
    Dugelay, Jean-Luc
    2021 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW 2021), 2021, : 229 - 237
  • [4] Lightweight Two-Stream Convolutional Neural Network for SAR Target Recognition
    Huang, Xiayuan
    Yang, Qiao
    Qiao, Hong
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (04) : 667 - 671
  • [5] Transferable two-stream convolutional neural network for human action recognition
    Xiong, Qianqian
    Zhang, Jianjing
    Wang, Peng
    Liu, Dongdong
    Gao, Robert X.
    JOURNAL OF MANUFACTURING SYSTEMS, 2020, 56 : 605 - 614
  • [6] StfNet: A Two-Stream Convolutional Neural Network for Spatiotemporal Image Fusion
    Liu, Xun
    Deng, Chenwei
    Chanussot, Jocelyn
    Hong, Danfeng
    Zhao, Baojun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (09): : 6552 - 6564
  • [7] Human Instance Segmentation Based on Two-Stream Convolutional Neural Network
    Ma Zitong
    Wang Guodong
    LASER & OPTOELECTRONICS PROGRESS, 2022, 59 (16)
  • [8] Environment Sound Event Classification With a Two-Stream Convolutional Neural Network
    Dong, Xifeng
    Yin, Bo
    Cong, Yanping
    Du, Zehua
    Huang, Xianqing
    IEEE ACCESS, 2020, 8 : 125714 - 125721
  • [9] Video Flame Detection Method Based on Two-Stream Convolutional Neural Network
    Yu, Naigong
    Chen, Yue
    PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 482 - 486
  • [10] Human Abnormal Behavior Recognition Based on Two-Stream Convolutional Neural Network
    Yi, Qiao
    INTERNATIONAL CONFERENCE ON SENSORS AND INSTRUMENTS (ICSI 2021), 2021, 11887