A Joint Model for Text and Image Semantic Feature Extraction

被引:8
作者
Cao, Jiarun [1 ]
Wang, Chongwen [1 ]
Gao, Liming [1 ]
机构
[1] Beijing Inst Technol, Digital Media Res Inst, Beijing, Peoples R China
来源
2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018) | 2018年
关键词
Natural language processing; Information retrieval; Similarity Calculation;
D O I
10.1145/3302425.3302437
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of the current information retrieval are based on keyword information appearing in the text or statistical information according to the number of vocabulary words. It is also possible to add additional semantic information by using synonyms, polysemous words, etc. to increase the accuracy of similarity and screening. However, in the current network, in addition to generate a large number of new words every day, pictures, audio, video and other information will appear too. Therefore, the manual features are difficult to express on this kind of newly appearing data, and the low-dimensional feature abstraction is very difficult to represent the overall semantics of text and images. In this paper, we propose a semantic feature extraction algorithm based on deep network, which applies the local attention mechanism to the feature generation model of pictures and texts. The retrieval of text and image information is converted into the similarity calculation of the vector, which improves the retrieval speed and ensures the semantic relevance of the result. Through the compilation of many years of news text and image data to complete the training and testing of text and image feature extraction models, the results show that the depth feature model has great advantages in semantic expression and feature extraction. On the other hand, add the similarity calculation to the training processing also improve the retrieval accuracy.
引用
收藏
页数:8
相关论文
共 10 条
  • [1] Ahmed E, 2015, PROC CVPR IEEE, P3908, DOI 10.1109/CVPR.2015.7299016
  • [2] [Anonymous], COMMUNICATIONS ACM
  • [3] [Anonymous], 2015, COMPUTER SCI
  • [4] Jing L. P., 2002, INT C MACH LEARN CYB, DOI [10.1109/icmlc.2002.1174522, DOI 10.1109/ICMLC.2002.1174522]
  • [5] Parkhi O. M., Vgg face descriptor
  • [6] ImageNet Large Scale Visual Recognition Challenge
    Russakovsky, Olga
    Deng, Jia
    Su, Hao
    Krause, Jonathan
    Satheesh, Sanjeev
    Ma, Sean
    Huang, Zhiheng
    Karpathy, Andrej
    Khosla, Aditya
    Bernstein, Michael
    Berg, Alexander C.
    Fei-Fei, Li
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 115 (03) : 211 - 252
  • [7] Vaswani Ashish, 2017, COMP VIS PATT REC WO
  • [8] Vinyals O, 2015, PROC CVPR IEEE, P3156, DOI 10.1109/CVPR.2015.7298935
  • [9] Wong S.M., 1985, P 8 ANN INT ACM SIGI, P18
  • [10] Xu W., 2000, INT C STAT LANG PROC