Research on Modelling Capability of English Multimodal File Search based on Transformer

被引:0
作者
Li, Hongjuan [1 ]
机构
[1] Pingdingshan Polytech Coll, Coll Continuing Educ, Pingdingshan, Peoples R China
关键词
Transformer; attention mechanism; multimodal; English document retrieval; STRATEGY; FUSION;
D O I
10.34028/iajit/22/1/9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the exponential growth of file data in the multimedia era, file retrieval ability to achieve effective data management has become a hot research field. Based on peoples' English file search needs, this paper proposes an English multimodal file search model based on transformer. Through ablation experiments on two public data sets and comparison experiments with the benchmark model, the effectiveness and superiority of the proposed transformers algorithm model in multi- modal data processing are verified. The multi-modal fusion retrieval system can usually achieve better performance than the single-modal retrieval system. This experiment focuses on three modes: Audio, Image and Text. The experimental results show that the proposed method can not only improve the efficiency of file search, but also extract modal features and perform feature fusion better. In the future, we can further explore different types of other attention mechanisms or integrate a variety of different architectures to further enhance the feasibility and superiority of multimodal file search.
引用
收藏
页码:116 / 123
页数:8
相关论文
共 22 条
  • [1] Graph Neural Networks With Convolutional ARMA Filters
    Bianchi, Filippo Maria
    Grattarola, Daniele
    Livi, Lorenzo
    Alippi, Cesare
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3496 - 3507
  • [2] An Analysis of Fusion Functions for Hybrid Retrieval
    Bruch, Sebastian
    Gai, Siyu
    Ingber, Amir
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (01)
  • [3] A Deep Look into neural ranking models for information retrieval
    Guo, Jiafeng
    Fan, Yixing
    Pang, Liang
    Yang, Liu
    Ai, Qingyao
    Zamani, Hamed
    Wu, Chen
    Croft, W. Bruce
    Cheng, Xueqi
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
  • [4] A Survey on Vision Transformer
    Han, Kai
    Wang, Yunhe
    Chen, Hanting
    Chen, Xinghao
    Guo, Jianyuan
    Liu, Zhenhua
    Tang, Yehui
    Xiao, An
    Xu, Chunjing
    Xu, Yixing
    Yang, Zhaohui
    Zhang, Yiman
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 87 - 110
  • [5] Topic-sensitive PageRank: A context-sensitive ranking algorithm for Web search
    Haveliwala, TH
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (04) : 784 - 796
  • [6] Authoritative sources in a hyperlinked environment
    Kleinberg, JM
    [J]. JOURNAL OF THE ACM, 1999, 46 (05) : 604 - 632
  • [7] A fusion approach to XML structured document retrieval
    Larson, RR
    [J]. INFORMATION RETRIEVAL, 2005, 8 (04): : 601 - 629
  • [8] Re-Transformer: A Self-Attention Based Model for Machine Translation
    Liu, Huey-Ing
    Chen, Wei-Lin
    [J]. AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 3 - 10
  • [9] Spaces, Trees, and Colors: The Algorithmic Landscape of Document Retrieval on Sequences
    Navarro, Gonzalo
    [J]. ACM COMPUTING SURVEYS, 2014, 46 (04)
  • [10] Object Detection of Road Assets Using Transformer-Based YOLOX with Feature Pyramid Decoder on Thai Highway Panorama
    Panboonyuen, Teerapong
    Thongbai, Sittinun
    Wongweeranimit, Weerachai
    Santitamnont, Phisan
    Suphan, Kittiwan
    Charoenphon, Chaiyut
    [J]. INFORMATION, 2022, 13 (01)