共 16 条
[1]
He K., Zhang X., Ren S., Et al., Deep residual learning for image recognition, Proceedings of Computer Vision and Pattern Recognition, pp. 770-778, (2016)
[2]
Devlin J., Chang M., Lee K., Et al., BERT: Pre-training of deep bidirectional transformers for language understanding, (2019)
[3]
Abadi M., Barham P., Chen J., Et al., TensorFlow: a system for large-scale machine learning, (2016)
[4]
Chen T., Li M., Li Y., Et al., MxNet: A flexible and flexible and efficient machine learning library for heterogeneous distributed systems, (2015)
[5]
Chen T., Moreau T., Jiang Z., Et al., TVM: an automated end-to-end optimizing compiler for deep learning, (2018)
[6]
Chetlur S., Woolley C., Vandermersch P., Et al., cuDNN: efficient primitives for deep learning, (2014)
[7]
Chen T., Du Z., Sun N., Et al., DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning, Architectural Support for Programming Languages and Operating Systems, pp. 269-284, (2014)
[8]
Du Z., Fasthuber R., Chen T., Et al., ShiDianNao: Shifting Vision Processing Closer to the sensor, International Symposium on Computer Architecture, 43, 3, pp. 92-104, (2015)
[9]
Jouppi N.P., Young C.S., Patil N., Et al., In-Datacenter Performance Analysis of a Tensor Processing Unit, International Symposium on Computer Architecture, pp. 1-12, (2017)
[10]
Liu S., Du Z., Tao J., Et al., Cambricon: an instruction set architecture for neural networks, International Symposium on Computer Architecture, pp. 393-405, (2016)