Generalization potential of large language models

被引:3
作者
Mikhail Budnikov [1 ]
Anna Bykova [2 ]
Ivan P. Yamshchikov [1 ]
机构
[1] Constructor University, Bremen
[2] LEYA, Higher School of Economics, St. Petersburg
[3] THWS, CAIRO, Würzburg
关键词
Generalization; Large language models; Semantic information;
D O I
10.1007/s00521-024-10827-6
中图分类号
学科分类号
摘要
The rise of deep learning techniques and especially the advent of large language models (LLMs) intensified the discussions around possibilities that artificial intelligence with higher generalization capability entails. The range of opinions on the capabilities of LLMs is extremely broad: from equating language models with stochastic parrots to stating that they are already conscious. This paper represents an attempt to review LLM landscape in the context of their generalization capacity as an information theoretic property of those complex systems. We discuss the suggested theoretical explanations for generalization in LLMs and highlight possible mechanisms responsible for these generalization properties. Through an examination of existing literature and theoretical frameworks, we endeavor to provide insights into the mechanisms driving the generalization capacity of LLMs, thus contributing to a deeper understanding of their capabilities and limitations in natural language processing tasks. © The Author(s) 2024.
引用
收藏
页码:1973 / 1997
页数:24
相关论文
共 152 条
[1]  
He K., Zhang X., Ren S., Sun J., Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Proceedings of the IEEE International Conference on Computer Vision, pp. 1026-1034, (2015)
[2]  
Shlegeris B., Roger F., Chan L., Language Models Seem to Be Much Better than Humans at Next-Token Prediction, (2022)
[3]  
Villalobos P., Sevilla J., Heim L., Besiroglu T., Hobbhahn M., Ho A., Will we run out of data? An analysis of the limits of scaling datasets in Machine learning, (2022)
[4]  
Hendrycks D., Mazeika M., Woodside T., An Overview of Catastrophic AI Risks, (2023)
[5]  
Han X., Zhang Z., Ding N., Gu Y., Liu X., Huo Y., Qiu J., Yao Y., Zhang A., Zhang L., Et al., Pre-trained models: past, present and future, AI Open, 2, pp. 225-250, (2021)
[6]  
Papadimitriou I., Jurafsky D., Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models, (2020)
[7]  
Hochreiter S., Schmidhuber J., Long short-term memory, Neural Comput, 9, 8, pp. 1735-1780, (1997)
[8]  
Lu K., Grover A., Abbeel P., Mordatch I., Pretrained Transformers as Universal Computation Engines, (2021)
[9]  
Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I., Et al., Language models are unsupervised multitask learners, OpenAI blog, 1, 8, (2019)
[10]  
Sinha K., Jia R., Hupkes D., Pineau J., Williams A., Kiela D., Masked language modeling and the distributional hypothesis: Order word matters pre-training for little, (2021)