A bio-inspired positional embedding network for transformer-based models

被引:2
|
作者
Tang, Xue-song [1 ,3 ]
Hao, Kuangrong [1 ,3 ,4 ]
Wei, Hui [2 ,5 ]
机构
[1] 2999 Renmin North Rd, Shanghai 201620, Peoples R China
[2] 2005 Songhu Rd, Shanghai 200434, Peoples R China
[3] Donghua Univ, Coll Informat Sci & Technol, Shanghai, Peoples R China
[4] Minist Educ, Engn Res Ctr Digitized Text Apparel Technol, Shanghai, Peoples R China
[5] Fudan Univ, Sch Comp Sci, Lab Algorithms Cognit Models, Shanghai, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
Transformers; Dorsal pathway modeling; Image classification; Position embedding; Zero padding;
D O I
10.1016/j.neunet.2023.07.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Owing to the progress of transformer-based networks, there have been significant improvements in the performance of vision models in recent years. However, there is further potential for improvement in positional embeddings that play a crucial role in distinguishing information across different positions. Based on the biological mechanisms of human visual pathways, we propose a positional embedding network that adaptively captures position information by modeling the dorsal pathway, which is responsible for spatial perception in human vision. Our proposed double-stream architecture leverages large zero-padding convolutions to learn local positional features and utilizes transformers to learn global features, effectively capturing the interaction between dorsal and ventral pathways. To evaluate the effectiveness of our method, we implemented experiments on various datasets, employing differentiated designs. Our statistical analysis demonstrates that the simple implementation significantly enhances image classification performance, and the observed trends demonstrate its biological plausibility.& COPY; 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页码:204 / 214
页数:11
相关论文
共 50 条
  • [41] SGNet: A Transformer-Based Semantic-Guided Network for Building Change Detection
    Feng, Jiangfan
    Yang, Xinyu
    Gu, Zhujun
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 9922 - 9935
  • [42] Optimizing Performance of Transformer-based Models for Fetal Brain MR Image Segmentation
    Pecco, Nicoll
    Della Rosa, Pasquale Anthony
    Canini, Matteo
    Nocera, Gianluca
    Scifo, Paola
    Cavoretto, Paolo Ivo
    Candiani, Massimo
    Falini, Andrea
    Castellano, Antonella
    Baldoli, Cristina
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2024, 6 (06)
  • [43] PARFormer: Transformer-Based Multi-Task Network for Pedestrian Attribute Recognition
    Fan, Xinwen
    Zhang, Yukang
    Lu, Yang
    Wang, Hanzi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 411 - 423
  • [44] Fusformer: A Transformer-Based Fusion Network for Hyperspectral Image Super-Resolution
    Hu, Jin-Fan
    Huang, Ting-Zhu
    Deng, Liang-Jian
    Dou, Hong-Xia
    Hong, Danfeng
    Vivone, Gemine
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [45] TVENet: Transformer-Based Visual Exploration Network for Mobile Robot in Unseen Environment
    Zhang, Tianyao
    Hu, Xiaoguang
    Xiao, Jin
    Zhang, Guofeng
    IEEE ACCESS, 2022, 10 : 62056 - 62072
  • [46] Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models
    Yang, Xi
    He, Xing
    Zhang, Hansi
    Ma, Yinghan
    Bian, Jiang
    Wu, Yonghui
    JMIR MEDICAL INFORMATICS, 2020, 8 (11)
  • [47] MPT-SFANet: Multiorder Pooling Transformer-Based Semantic Feature Aggregation Network for SAR Image Classification
    Ni, Kang
    Yuan, Chunyang
    Zheng, Zhizhong
    Zhang, Bingbing
    Wang, Peng
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024, 60 (04) : 4923 - 4938
  • [48] Error Types in Transformer-Based Paraphrasing Models: A Taxonomy, Paraphrase Annotation Model and Dataset
    Berro, Auday
    Benatallah, Boualem
    Gaci, Yacine
    Benabdeslem, Khalid
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT I, ECML PKDD 2024, 2024, 14941 : 332 - 349
  • [49] TDFNet: Transformer-Based Deep-Scale Fusion Network for Multimodal Emotion Recognition
    Zhao, Zhengdao
    Wang, Yuhua
    Shen, Guang
    Xu, Yuezhu
    Zhang, Jiayuan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3771 - 3782
  • [50] Abusive Bangla comments detection on Facebook using transformer-based deep learning models
    Aurpa, Tanjim Taharat
    Sadik, Rifat
    Ahmed, Md Shoaib
    SOCIAL NETWORK ANALYSIS AND MINING, 2022, 12 (01)