共 29 条
- [1] DEVLIN J, CHANG M W, LEE K, Et al., BERT:pre-training of deep bidirectional transformers for language understanding
- [2] BROWN T B, MANN B, RYDER N, Et al., Language models are few-shot learners
- [3] WANG H R, ZHANG Z K, HAN S., SpAtten:efficient sparse attention architecture with cascade token and head pruning, Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA), (2021)
- [4] MIKOLOV T, KARAFIAT M, BURGET L, Et al., Recurrent neural network based language model, Proceedings of the 11th Annual Conference of the International Speech Communication Association, (2010)
- [5] GRAVES A., Long short-term memory, Supervised sequence labelling with recurrent neural networks, pp. 37-45, (2012)
- [6] VASWANI A, SHAZEER N, PARMAR N, Et al., Attention is all you need, Proceedings of the 31st Conference on Neural Information Processing Systems, (2017)
- [7] LU L Q, JIN Y C, BI H R, Et al., Sanger:a co-design framework for enabling sparse attention using reconfigurable architecture, Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture, (2021)
- [8] ZAFRIR O, BOUDOUKH G, IZSAK P, Et al., Q8BERT:quantized 8 bit BERT, Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), (2019)
- [9] SHEN S, DONG Z, YE J Y, Et al., Q-BERT:hessian based ultra low precision quantization of BERT, Proceedings of the AAAI Conference on Artificial Intelligence, (2020)
- [10] ZADEH A H, EDO I, AWAD O M, Et al., GOBO:quantizing attention-based NLP models for low latency and energy efficient inference, Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), (2020)