Recasting Self-Attention with Holographic Reduced Representations

被引:0
作者
Alam, Mohammad Mahmudul [1 ]
Raff, Edward [1 ,2 ,3 ]
Biderman, Stella [2 ,3 ,4 ]
Oates, Tim [1 ]
Holt, James [2 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Comp Sci & Elect Engn, Baltimore, MD 21228 USA
[2] Lab Phys Sci, College Pk, MD 20740 USA
[3] Booz Allen Hamilton, Mclean, VA 22102 USA
[4] EleutherAI, New York, NY USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202 | 2023年 / 202卷
关键词
DETECT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the O(T-2) memory and O((TH)-H-2) compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of T >= 100, 000 are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a "Hrrformer" we obtain several benefits including O(TH logH) time complexity, O(TH) space complexity, and convergence in 10x fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to 280x faster to train on the Long Range Arena benchmark. Code is available at https: //github. com/NeuromorphicComputa tionResearchProgram/Hrrformer
引用
收藏
页码:490 / 507
页数:18
相关论文
共 69 条
[1]  
Abou-Assaleh T, 2004, P INT COMP SOFTW APP, P41
[2]   When Malware is Packin' Heat; Limits of Machine Learning Classifiers Based on Static Analysis Features [J].
Aghakhani, Hojjat ;
Gritti, Fabio ;
Mecca, Francesco ;
Lindorfer, Martina ;
Ortolani, Stefano ;
Balzarotti, Davide ;
Vigna, Giovanni ;
Krueger, Christopher .
27TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2020), 2020,
[3]   Nengo: a Python']Python tool for building large-scale functional brain models [J].
Bekolay, Trevor ;
Bergstra, James ;
Hunsberger, Eric ;
DeWolf, Travis ;
Stewart, Terrence C. ;
Rasmussen, Daniel ;
Choo, Xuan ;
Voelker, Aaron Russell ;
Eliasmith, Chris .
FRONTIERS IN NEUROINFORMATICS, 2014, 7
[4]  
Beltagy I, 2020, Arxiv, DOI arXiv:2004.05150
[5]  
Blouw P., 2013, 35 ANN C COGN SCI SO, P1905
[6]   Concepts as Semantic Pointers: A Framework and Computational Model [J].
Blouw, Peter ;
Solodkin, Eugene ;
Thagard, Paul ;
Eliasmith, Chris .
COGNITIVE SCIENCE, 2016, 40 (05) :1128-1162
[7]   On normalized compression distance and large malware Towards a useful definition of normalized compression distance for the classification of large files [J].
Borbely, Rebecca Schuller .
JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2016, 12 (04) :235-242
[8]  
cal M., 2018, ICLR WORKSH
[9]  
Child R, 2019, Arxiv, DOI arXiv:1904.10509
[10]  
Choromanski K, 2021, Arxiv, DOI arXiv:2009.14794