Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group Attention

被引:0
作者
Liu, Kai [1 ]
Wu, Tianyi [2 ,3 ]
Liu, Cong [1 ]
Guo, Guodong [2 ,3 ]
机构
[1] Sun Yat Sen Univ, Guangzhou, Peoples R China
[2] Baidu Res, Inst Deep Learning, Beijing, Peoples R China
[3] Natl Engn Lab Deep Learning Technol & Applicat, Beijing, Peoples R China
来源
PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022 | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, Transformers have shown promising performance in various vision tasks. To reduce the quadratic computation complexity caused by each query attending to all keys/values, various methods have constrained the range of attention within local regions, where each query only attends to keys/values within a hand-crafted window. However, these hand-crafted window partition mechanisms are data-agnostic and ignore their input content, so it is likely that one query maybe attends to irrelevant keys/values. To address this issue, we propose a Dynamic Group Attention (DG-Attention), which dynamically divides all queries into multiple groups and selects the most relevant keys/values for each group. Our DG-Attention can flexibly model more relevant dependencies without any spatial constraint that is used in hand-crafted window based attention. Built on the DG-Attention, we develop a general vision transformer backbone named Dynamic Group Transformer (DGT). Extensive experiments show that our models can outperform the state-of-the-art methods on multiple common vision tasks, including image classification, semantic segmentation, object detection, and instance segmentation.
引用
收藏
页码:1187 / 1193
页数:7
相关论文
共 25 条
[1]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[2]  
Chu X., 2021, Advances in neural information processing systems, V34, P9355
[3]  
Chu Xiangxiang, 2021, ARXIV210210882
[4]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[5]  
Dong Xiaoyi, 2021, ARXIV210700652
[6]  
Dosovitskiy A, 2020, ARXIV
[7]  
Fang J., 2021, ARXIV210515168
[8]  
Gui Y.-Y., 2016, NEWZOO, V3, P522
[9]  
Guo J., 2021, ARXIV210706263, P12175
[10]  
He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]