Research on Modelling Capability of English Multimodal File Search based on Transformer

被引：0

作者：

Li, Hongjuan ^{[1
]}

机构：

[1] Pingdingshan Polytech Coll, Coll Continuing Educ, Pingdingshan, Peoples R China

来源：

INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY | 2025年 / 22卷 / 01期

关键词：

Transformer; attention mechanism; multimodal; English document retrieval; STRATEGY; FUSION;

D O I：

10.34028/iajit/22/1/9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the exponential growth of file data in the multimedia era, file retrieval ability to achieve effective data management has become a hot research field. Based on peoples' English file search needs, this paper proposes an English multimodal file search model based on transformer. Through ablation experiments on two public data sets and comparison experiments with the benchmark model, the effectiveness and superiority of the proposed transformers algorithm model in multi- modal data processing are verified. The multi-modal fusion retrieval system can usually achieve better performance than the single-modal retrieval system. This experiment focuses on three modes: Audio, Image and Text. The experimental results show that the proposed method can not only improve the efficiency of file search, but also extract modal features and perform feature fusion better. In the future, we can further explore different types of other attention mechanisms or integrate a variety of different architectures to further enhance the feasibility and superiority of multimodal file search.

引用

页码：116 / 123

页数：8

共 22 条

[1] Graph Neural Networks With Convolutional ARMA Filters
Bianchi, Filippo Maria
Grattarola, Daniele
Livi, Lorenzo
Alippi, Cesare
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3496 - 3507
[2] An Analysis of Fusion Functions for Hybrid Retrieval
Bruch, Sebastian
Gai, Siyu
Ingber, Amir
[J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (01)
[3] A Deep Look into neural ranking models for information retrieval
Guo, Jiafeng
Fan, Yixing
Pang, Liang
Yang, Liu
Ai, Qingyao
Zamani, Hamed
Wu, Chen
Croft, W. Bruce
Cheng, Xueqi
[J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
[4] A Survey on Vision Transformer
Han, Kai
Wang, Yunhe
Chen, Hanting
Chen, Xinghao
Guo, Jianyuan
Liu, Zhenhua
Tang, Yehui
Xiao, An
Xu, Chunjing
Xu, Yixing
Yang, Zhaohui
Zhang, Yiman
Tao, Dacheng
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 87 - 110
[5] Topic-sensitive PageRank: A context-sensitive ranking algorithm for Web search
Haveliwala, TH
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (04) : 784 - 796
[6] Authoritative sources in a hyperlinked environment
Kleinberg, JM
[J]. JOURNAL OF THE ACM, 1999, 46 (05) : 604 - 632
[7] A fusion approach to XML structured document retrieval
Larson, RR
[J]. INFORMATION RETRIEVAL, 2005, 8 (04): : 601 - 629
[8] Re-Transformer: A Self-Attention Based Model for Machine Translation
Liu, Huey-Ing
Chen, Wei-Lin
[J]. AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 3 - 10
[9] Spaces, Trees, and Colors: The Algorithmic Landscape of Document Retrieval on Sequences
Navarro, Gonzalo
[J]. ACM COMPUTING SURVEYS, 2014, 46 (04)
[10] Object Detection of Road Assets Using Transformer-Based YOLOX with Feature Pyramid Decoder on Thai Highway Panorama
Panboonyuen, Teerapong
Thongbai, Sittinun
Wongweeranimit, Weerachai
Santitamnont, Phisan
Suphan, Kittiwan
Charoenphon, Chaiyut
[J]. INFORMATION, 2022, 13 (01)

← 1 2 3 →