ReViT: Vision Transformer Accelerator With Reconfigurable Semantic-Aware Differential Attention

被引：0

作者：

Zou, Xiaofeng ^{[1
]}

Chen, Cen ^{[1
,2
]}

Shao, Hongen ^{[1
]}

Wang, Qinyu ^{[1
]}

Zhuang, Xiaobin ^{[1
]}

Li, Yangfan ^{[3
]}

Li, Keqin ^{[4
]}

机构：

[1] South China Univ Technol, Sch Future Technol, Guangzhou 510641, Peoples R China

[2] Pazhou Lab, Guangzhou 510335, Peoples R China

[3] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China

[4] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2025年 / 74卷 / 03期

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Semantics; Transformers; Visualization; Computer vision; Computational modeling; Attention mechanisms; Dogs; Computers; Snow; Performance evaluation; Hardware accelerator; vision transformers; software-hardware co-design; HIERARCHIES;

D O I：

10.1109/TC.2024.3504263

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

While vision transformers (ViTs) have continued to achieve new milestones in computer vision, their complicated network architectures with high computation and memory costs have hindered their deployment on resource-limited edge devices. Some customized accelerators have been proposed to accelerate the execution of ViTs, achieving improved performance with reduced energy consumption. However, these approaches utilize flattened attention mechanisms and ignore the inherent hierarchical visual semantics in images. In this work, we conduct a thorough analysis of hierarchical visual semantics in real-world images, revealing opportunities and challenges of leveraging visual semantics to accelerate ViTs. We propose ReViT, a systematic algorithm and architecture co-design approach, which aims to exploit the visual semantics to accelerate ViTs. Our proposed algorithm can leverage the same semantic class with strong feature similarity to reduce computation and communication in a differential attention mechanism, and support the semantic-aware attention efficiently. A novel dedicated architecture is designed to support the proposed algorithm and translate it into performance improvements. Moreover, we propose an efficient execution dataflow to alleviate workload imbalance and maximize hardware utilization. ReViT opens new directions for accelerating ViTs by exploring the underlying visual semantics of images. ReViT gains an average of 2.3x speedup and 3.6x energy efficiency over state-of-the-art ViT accelerators.

引用

页码：1079 / 1093

页数：15

共 45 条

[31] ImageNet Large Scale Visual Recognition Challenge [J].

Russakovsky, Olga ;

Deng, Jia ;

Su, Hao ;

Krause, Jonathan ;

Satheesh, Sanjeev ;

Ma, Sean ;

Huang, Zhiheng ;

Karpathy, Andrej ;

Khosla, Aditya ;

Bernstein, Michael ;

Berg, Alexander C. ;

Fei-Fei, Li .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 115 (03) :211-252

[32] Boosting vision transformers for image retrieval [J].

Song, Chull Hwan ;

Yoon, Jooyoung ;

Choi, Shunghyun ;

Avrithis, Yannis .

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :107-117

[33]

Thoziyoor S., 2008, Tech. Rep. HPL-2008-20, P1

[34]

Touvron H, 2021, PR MACH LEARN RES, V139, P7358

[35]

Vaswani A, 2017, ADV NEUR IN, V30

[36] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning [J].

Wang, Hanrui ;

Zhang, Zhekai ;

Han, Song .

2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, :97-110

[37] CTA: Hardware-Software Co-design for Compressed Token Attention Mechanism [J].

Wang, Haoran ;

Xu, Haobo ;

Wang, Ying ;

Han, Yinhe .

2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, :429-441

[38] VisualNet: An End-to-End Human Visual System Inspired Framework to Reduce Inference Latency of Deep Neural Networks [J].

Wang, Tianchen ;

Zhang, Jiawei ;

Xiong, Jinjun ;

Bian, Song ;

Yan, Zheyu ;

Huang, Meiping ;

Zhuang, Jian ;

Sato, Takashi ;

Xu, Xiaowei ;

Shi, Yiyu .

IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (11) :2717-2727

[39]

Xu YF, 2022, AAAI CONF ARTIF INTE, P2964

[40] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design [J].

You, Haoran ;

Sun, Zhanyi ;

Shi, Huihong ;

Yu, Zhongzhi ;

Zhao, Yang ;

Zhang, Yongan ;

Li, Chaojian ;

Li, Baopu ;

Lin, Yingyan .

2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, :273-286

← 1 2 3 4 5 →