Kronecker Attention Networks

被引:8
作者
Gao, Hongyang [1 ]
Wang, Zhengyang [1 ]
Ji, Shuiwang [1 ]
机构
[1] Texas A&M Univ, College Stn, TX 77843 USA
来源
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2020年
基金
美国国家科学基金会;
关键词
Attention; neural networks; Kronecker attention; image classification; image segmentation;
D O I
10.1145/3394486.3403065
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attention operators have been applied on both 1-D data like texts and higher-order data such as images and videos. Use of attention operators on high-order data requires flattening of the spatial or spatial-temporal dimensions into a vector, which is assumed to follow a multivariate normal distribution. This not only incurs excessive requirements on computational resources, but also fails to preserve structures in data. In this work, we propose to avoid flattening by assuming the data follow matrix-variate normal distributions. Based on this new view, we develop Kronecker attention operators (KAOs) that operate on high-order tensor data directly. More importantly, the proposed KAOs lead to dramatic reductions in computational resources. Experimental results show that our methods reduce the amount of required computational resources by a factor of hundreds, with larger factors for higher-dimensional and higher-order data. Results also show that networks with KAOs outperform models without attention, while achieving competitive performance as those with original attention operators.
引用
收藏
页码:229 / 237
页数:9
相关论文
共 40 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION
    Allen, Genevera I.
    Tibshirani, Robert
    [J]. ANNALS OF APPLIED STATISTICS, 2010, 4 (02) : 764 - 790
  • [3] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [4] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [5] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [6] Everingham M., 2010, INT J COMPUT VISION, V88, P303, DOI DOI 10.1007/s11263-009-0275-4
  • [7] Graph Representation Learning via Hard and Channel-Wise Attention Networks
    Gao, Hongyang
    Ji, Shuiwang
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 741 - 749
  • [8] Large-Scale Learnable Graph Convolutional Networks
    Gao, Hongyang
    Wang, Zhengyang
    Ji, Shuiwang
    [J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1416 - 1424
  • [9] Glorot X., 2010, P 13 INT C ART INT S, P249, DOI DOI 10.1109/LGRS.2016.2565705
  • [10] Graham A., 2018, KRONECKER PRODUCTS M