Emergent linguistic structure in artificial neural networks trained by self-supervision

被引:144
|
作者
Manning, Christopher D. [1 ]
Clark, Kevin [1 ]
Hewitt, John [1 ]
Khandelwal, Urvashi [1 ]
Levy, Omer [2 ]
机构
[1] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA
[2] Facebook Inc, Facebook Artificial Intelligence Res, Seattle, WA 98109 USA
关键词
artificial neural netwok; self-supervision; syntax; learning; LANGUAGE; ACQUISITION;
D O I
10.1073/pnas.1907367117
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper explores the knowledge of linguistic structure learned by large artificial neural networks, trained via self-supervision, whereby the model simply tries to predict a masked word in a given context. Human language communication is via sequences of words, but language understanding requires constructing rich hierarchical structures that are never observed explicitly. The mechanisms for this have been a prime mystery of human language acquisition, while engineering work has mainly proceeded by supervised learning on treebanks of sentences hand labeled for this latent structure. However, we demonstrate that modern deep contextual language models learn major aspects of this structure, without any explicit supervision. We develop methods for identifying linguistic hierarchical structure emergent in artificial neural networks and demonstrate that components in these models focus on syntactic grammatical relationships and anaphoric coreference. Indeed, we show that a linear transformation of learned embeddings in these models captures parse tree distances to a surprising degree, allowing approximate reconstruction of the sentence tree structures normally assumed by linguists. These results help explain why these models have brought such large improvements across many language-understanding tasks.
引用
收藏
页码:30046 / 30054
页数:9
相关论文
共 50 条
  • [21] Improving quantitative structure–activity relationship models using Artificial Neural Networks trained with dropout
    Jeffrey Mendenhall
    Jens Meiler
    Journal of Computer-Aided Molecular Design, 2016, 30 : 177 - 189
  • [22] Self-supervision assisted multimodal remote sensing image classification with coupled self-looping convolution networks
    Pande, Shivam
    Banerjee, Biplab
    NEURAL NETWORKS, 2023, 164 : 1 - 20
  • [23] Defending Graph Convolutional Networks against Dynamic Graph Perturbations via Bayesian Self-Supervision
    Zhuang, Jun
    Al Hasan, Mohammad
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 4405 - 4413
  • [24] Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval
    Xu, Xing
    Lu, Huimin
    Song, Jingkuan
    Yang, Yang
    Shen, Heng Tao
    Li, Xuelong
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (06) : 2400 - 2413
  • [25] Classification Tree Extraction from Trained Artificial Neural Networks
    Bondarenko, Andrey
    Aleksejeva, Ludmila
    Jumutc, Vilen
    Borisov, Arkady
    ICTE 2016, 2017, 104 : 556 - 563
  • [26] Improving quantitative structure-activity relationship models using Artificial Neural Networks trained with dropout
    Mendenhall, Jeffrey
    Meiler, Jens
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2016, 30 (02) : 177 - 189
  • [27] Incremental accumulation of linguistic context in artificial and biological neural networks
    Tikochinski, Refael
    Goldstein, Ariel
    Meiri, Yoav
    Hasson, Uri
    Reichart, Roi
    NATURE COMMUNICATIONS, 2025, 16 (01)
  • [28] Artificial Neural Networks Trained to Detect Viral and Phage Structural Proteins
    Seguritan, Victor
    Alves, Nelson, Jr.
    Arnoult, Michael
    Raymond, Amy
    Lorimer, Don
    Burgin, Alex B., Jr.
    Salamon, Peter
    Segall, Anca M.
    PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (08)
  • [29] Artificial Neural Networks for the Diagnosis of Aggressive Periodontitis Trained by Immunologic Parameters
    Papantonopoulos, Georgios
    Takahashi, Keiso
    Bountis, Tasos
    Loos, Bruno G.
    PLOS ONE, 2014, 9 (03):
  • [30] Self-organising artificial neural networks
    Flanagan, JA
    Hasler, M
    FROM NATURAL TO ARTIFICIAL NEURAL COMPUTATION, 1995, 930 : 322 - 329