共 22 条
- [1] Peters M, Neumann M, Iyyer M, Et al., Deep contex-tualized word representations, Proceedings of the NAACL-HLT, pp. 2227-2237, (2018)
- [2] Devlin J, Chang M W, Lee K, Et al., BERT: pre-training of deep bidirectional transformers for lang-uage understanding, Proceedings of the NAACL-HLT, pp. 4171-4186, (2019)
- [3] Radford A, Narasimhan K, Salimans T, Et al., Improv-ing language understanding by generative pre-training [R/OL], (2018)
- [4] Brown T B, Mann B, Ryder N, Et al., Language models are few-shot learners, Proceedings of the NeurIPS, pp. 1877-1901, (2020)
- [5] Imamura K, Sumita E., Recycling a pre-trained BERT encoder for neural machine translation, Proceedings of the EMNLP & NGT, pp. 23-31, (2019)
- [6] Kim Y, Rush A M., Sequence-level knowledge distilla-tion, Proceedings of the EMNLP, pp. 1317-1327, (2016)
- [7] Hinton G, Vinyals O, Dean J., Distilling the know-ledge in a neural network [EB/OL]
- [8] Weng R, Yu H, Huang S, Et al., Acquiring knowledge from pre-trained model to neural machine translation, Proceedings of the AAAI, pp. 9266-9273, (2020)
- [9] Yang J, Wang M, Zhou H, Et al., Towards making the most of bert in neural machine translation, Procee-dings of the AAAI, pp. 9378-9385, (2020)
- [10] Chen Y C, Gan Z, Cheng Y, Et al., Distilling know-ledge learned in BERT for text generation, Procee-dings of the ACL, pp. 7893-7905, (2020)