Emergent linguistic structure in artificial neural networks trained by self-supervision

被引：144

作者：

Manning, Christopher D. ^{[1
]}

Clark, Kevin ^{[1
]}

Hewitt, John ^{[1
]}

Khandelwal, Urvashi ^{[1
]}

Levy, Omer ^{[2
]}

机构：

[1] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA

[2] Facebook Inc, Facebook Artificial Intelligence Res, Seattle, WA 98109 USA

来源：

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA | 2020年 / 117卷 / 48期

关键词：

artificial neural netwok; self-supervision; syntax; learning; LANGUAGE; ACQUISITION;

D O I：

10.1073/pnas.1907367117

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

This paper explores the knowledge of linguistic structure learned by large artificial neural networks, trained via self-supervision, whereby the model simply tries to predict a masked word in a given context. Human language communication is via sequences of words, but language understanding requires constructing rich hierarchical structures that are never observed explicitly. The mechanisms for this have been a prime mystery of human language acquisition, while engineering work has mainly proceeded by supervised learning on treebanks of sentences hand labeled for this latent structure. However, we demonstrate that modern deep contextual language models learn major aspects of this structure, without any explicit supervision. We develop methods for identifying linguistic hierarchical structure emergent in artificial neural networks and demonstrate that components in these models focus on syntactic grammatical relationships and anaphoric coreference. Indeed, we show that a linear transformation of learned embeddings in these models captures parse tree distances to a surprising degree, allowing approximate reconstruction of the sentence tree structures normally assumed by linguists. These results help explain why these models have brought such large improvements across many language-understanding tasks.

引用

页码：30046 / 30054

页数：9

共 50 条

[41] IMPROVING THE ACCURACY OF AN ARTIFICIAL NEURAL NETWORK USING MULTIPLE DIFFERENTLY TRAINED NETWORKS
BAXT, WG
NEURAL COMPUTATION, 1992, 4 (05) : 772 - 780
[42] Survey and critique of techniques for extracting rules from trained artificial neural networks
Andrews, R
Diederich, J
Tickle, AB
KNOWLEDGE-BASED SYSTEMS, 1995, 8 (06) : 373 - 389
[43] Structure from Motion by Artificial Neural Networks
Schoening, Julius
Behrens, Thea
Faion, Patrick
Kheiri, Peyman
Heidemann, Gunther
Krumnack, Ulf
IMAGE ANALYSIS, SCIA 2017, PT I, 2017, 10269 : 146 - 158
[44] A novel type of activation function in artificial neural networks: Trained activation function
Ertugrul, Omer Faruk
NEURAL NETWORKS, 2018, 99 : 148 - 157
[45] Creep modelling of polypropylenes using artificial neural networks trained with Bee algorithms
Dugenci, Muharrem
Aydemir, Alpay
Esen, Ismail
Aydin, Mehmet Emin
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 45 : 71 - 79
[46] Feasibility of employing artificial neural networks for emergent crop monitoring in SAR systems
Ghinelli, BMG
Bennett, JC
IEE PROCEEDINGS-RADAR SONAR AND NAVIGATION, 1998, 145 (05) : 291 - 296
[47] On the neural networks of self and other bias and their role in emergent social interactions
Forbes, Chad E.
CORTEX, 2024, 177 : 113 - 129
[48] Self-configuration Using Artificial Neural Networks
Ather, Maleeha
Khan, Malik Jahan
ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, 2010, 93 : 16 - +
[49] Emergent perceptual biases from state-space geometry in trained spiking recurrent neural networks
Serrano-Fernandez, Luis
Beiran, Manuel
Parga, Nestor
CELL REPORTS, 2024, 43 (07):
[50] Communication Channel Equalization Based on Levenberg-Marquardt Trained Artificial Neural Networks
Ghadjati, M.
Moussaoui, A. K.
Bouchemel, A.
2013 3D INTERNATIONAL CONFERENCE ON SYSTEMS AND CONTROL (ICSC), 2013,

← 1 2 3 4 5 →