What does the language system look like in pre-trained language models? A study using complex networks

被引:1
|
作者
Zheng, Jianyu [1 ]
机构
[1] Tsinghua Univ, Dept Chinese Language & Literature, Beijing 100084, Peoples R China
关键词
Language model; BERT; Complex network; Language system;
D O I
10.1016/j.knosys.2024.111984
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained language models has advanced the fields of natural language processing. The exceptional capabilities exhibited by PLMs in NLP tasks have been attracting researchers to explore the underlying factors responsible for their success. However, most of work primarily focus on studying some certain linguistic knowledge encoded in PLMs, rather than investigating how these models comprehend language from a holistic perspective. Furthermore, they cannot point out how PLMs organize the whole language system. Therefore, we adopt the complex network approach to represent the language system, and investigate how language elements are organized within the system. Specifically, we take the attention relationships among words as the research object, which are generated by attention heads within BERT models. Then, the words are treated as nodes, and the connections between words and their most-attending words are represented as edges. After obtaining these "words' attention networks", we analyze the network properties from various perspectives by calculating the network metrics. Many constructive conclusions are summarized, including: (1) The English attention networks demonstrate exceptional performance in organizing words; (2) Most words' attention networks exhibit small-world property and scale-free behavior; (3) Some networks generated by multilingual BERT can reflect typological information well, achieving preferable clustering performance among language groups; (4) In cross-layer analysis, the networks from 8 to 10 layers in Chinese BERT and from 6 to 9 layers in English BERT exhibit more consistent characteristics. Our study provides a comprehensive explanation of how PLMs organize language systems, which can be utilized to evaluate and develop improved models.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] English-Assamese neural machine translation using prior alignment and pre-trained language model
    Laskar, Sahinur Rahman
    Paul, Bishwaraj
    Dadure, Pankaj
    Manna, Riyanka
    Pakray, Partha
    Bandyopadhyay, Sivaji
    COMPUTER SPEECH AND LANGUAGE, 2023, 82
  • [42] Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code
    Tagarelli, Andrea
    Simeri, Andrea
    ARTIFICIAL INTELLIGENCE AND LAW, 2022, 30 (03) : 417 - 473
  • [43] Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code
    Andrea Tagarelli
    Andrea Simeri
    Artificial Intelligence and Law, 2022, 30 : 417 - 473
  • [44] Learning to Predict US Policy Change Using New York Times Corpus with Pre-Trained Language Model
    Zhang, Guoshuai
    Wu, Jiaji
    Tan, Mingzhou
    Yang, Zhongjie
    Cheng, Qingyu
    Han, Hong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 34227 - 34240
  • [45] Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
    Christian, Hans
    Suhartono, Derwin
    Chowanda, Andry
    Zamli, Kamal Z.
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [46] Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
    Hans Christian
    Derwin Suhartono
    Andry Chowanda
    Kamal Z. Zamli
    Journal of Big Data, 8
  • [47] Framing and BERTology: A Data-Centric Approach to Integration of Linguistic Features into Transformer-Based Pre-trained Language Models
    Avetisyan, Hayastan
    Safikhani, Parisa
    Broneske, David
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 4, INTELLISYS 2023, 2024, 825 : 81 - 90
  • [48] Incorporation of company-related factual knowledge into pre-trained language models for stock-related spam tweet filtering
    Park, Jihye
    Cho, Sungzoon
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
  • [49] Puer at SemEval-2024 Task 4: Fine-tuning Pre-trained Language Models for Meme Persuasion Technique Detection
    Dao, Jiaxu
    Li, Zhuoying
    Su, Youbang
    Gong, Wensheng
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 64 - 69
  • [50] FedITD: A Federated Parameter-Efficient Tuning With Pre-Trained Large Language Models and Transfer Learning Framework for Insider Threat Detection
    Wang, Zhi Qiang
    Wang, Haopeng
    El Saddik, Abdulmotaleb
    IEEE ACCESS, 2024, 12 : 160396 - 160417