Generalization potential of large language models

被引:3
作者
Mikhail Budnikov [1 ]
Anna Bykova [2 ]
Ivan P. Yamshchikov [1 ]
机构
[1] Constructor University, Bremen
[2] LEYA, Higher School of Economics, St. Petersburg
[3] THWS, CAIRO, Würzburg
关键词
Generalization; Large language models; Semantic information;
D O I
10.1007/s00521-024-10827-6
中图分类号
学科分类号
摘要
The rise of deep learning techniques and especially the advent of large language models (LLMs) intensified the discussions around possibilities that artificial intelligence with higher generalization capability entails. The range of opinions on the capabilities of LLMs is extremely broad: from equating language models with stochastic parrots to stating that they are already conscious. This paper represents an attempt to review LLM landscape in the context of their generalization capacity as an information theoretic property of those complex systems. We discuss the suggested theoretical explanations for generalization in LLMs and highlight possible mechanisms responsible for these generalization properties. Through an examination of existing literature and theoretical frameworks, we endeavor to provide insights into the mechanisms driving the generalization capacity of LLMs, thus contributing to a deeper understanding of their capabilities and limitations in natural language processing tasks. © The Author(s) 2024.
引用
收藏
页码:1973 / 1997
页数:24
相关论文
共 152 条
[41]  
Wei C., Ma T., Data-dependent sample complexity of deep neural networks via Lipschitz augmentation, Advances in Neural Information Processing Systems, 32, (2019)
[42]  
Kawaguchi K., Kaelbling L.P., Bengio Y., Generalization in Deep Learning, (2017)
[43]  
Hochreiter S., Schmidhuber J., Flat minima, Neural Comput, 9, 1, pp. 1-42, (1997)
[44]  
Bahri D., Mobahi H., Tay Y., Sharpness-Aware Minimization Improves Language Model Generalization, (2021)
[45]  
Orvieto A., Kersting H., Proske F., Bach F., Lucchi A., Anticorrelated noise injection for improved generalization, International Conference on Machine Learning, pp. 17094-17116, (2022)
[46]  
Chatterjee S., Zielinski P., On the generalization mystery in deep learning, (2022)
[47]  
Bousquet O., Elisseeff A., Algorithmic stability and generalization performance, Advances in Neural Information Processing Systems, 13, (2000)
[48]  
Wolpert D.H., The lack of a priori distinctions between learning algorithms, Neural Comput, 8, 7, pp. 1341-1390, (1996)
[49]  
Solomonoff R.J., Algorithmic probability: Theory and applications, Information Theory and Statistical Learning, pp. 1-23, (2009)
[50]  
Goldblum M., Finzi M., Rowan K., Wilson A.G., The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning, (2023)