共 189 条
[101]
Devlin J., Chang M., Lee K., Toutanova K., BERT: Pre-training of deep bidirectional transformers for language understanding, Proceedings of NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Vol. 1, pp. 4171-4186, (2019)
[102]
Radford A., Narasimhan K., Salimans T., Sutskever I., Et al., Improving Language Understanding by Generative Pre-Training, (2018)
[103]
Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I., Et al., Language models are unsupervised multitask learners, (2019)
[104]
Brown T., Mann B., Ryder N., Subbiah M., Kaplan J.D., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., Et al., Language models are few-shot learners, Adv. Neural Inf. Process. Syst., 33, pp. 1877-1901, (2020)
[105]
Raffel C., Shazeer N., Roberts A., Lee K., Narang S., Matena M., Zhou Y., Li W., Liu P.J., Exploring the limits of transfer learning with a unified text-to-text transformer, J. Mach. Learn. Res., 21, 140, pp. 1-67, (2020)
[106]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I., Attention is all you need, Adv. Neural Inf. Process. Syst., 30, (2017)
[107]
Han K., Wang Y., Chen H., Chen X., Guo J., Liu Z., Tang Y., Xiao A., Xu C., Xu Y., Et al., A survey on vision transformer, IEEE Trans. Pattern Anal. Mach. Intell., 45, 1, pp. 87-110, (2022)
[108]
Ba J.L., Kiros J.R., Hinton G.E., Layer normalization, (2016)
[109]
Wen Q., Zhou T., Zhang C., Chen W., Ma Z., Yan J., Sun L., Transformers in time series: A survey, Proc. Thirty-Second Int. Joint Conf. Artif. Intell., IJCAI-23, pp. 6778-6786, (2023)
[110]
Mienye I.D., Jere N., Deep learning for credit card fraud detection: A review of algorithms, challenges, and solutions, IEEE Access, 12, (2024)