What does the language system look like in pre-trained language models? A study using complex networks

被引:1
|
作者
Zheng, Jianyu [1 ]
机构
[1] Tsinghua Univ, Dept Chinese Language & Literature, Beijing 100084, Peoples R China
关键词
Language model; BERT; Complex network; Language system;
D O I
10.1016/j.knosys.2024.111984
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained language models has advanced the fields of natural language processing. The exceptional capabilities exhibited by PLMs in NLP tasks have been attracting researchers to explore the underlying factors responsible for their success. However, most of work primarily focus on studying some certain linguistic knowledge encoded in PLMs, rather than investigating how these models comprehend language from a holistic perspective. Furthermore, they cannot point out how PLMs organize the whole language system. Therefore, we adopt the complex network approach to represent the language system, and investigate how language elements are organized within the system. Specifically, we take the attention relationships among words as the research object, which are generated by attention heads within BERT models. Then, the words are treated as nodes, and the connections between words and their most-attending words are represented as edges. After obtaining these "words' attention networks", we analyze the network properties from various perspectives by calculating the network metrics. Many constructive conclusions are summarized, including: (1) The English attention networks demonstrate exceptional performance in organizing words; (2) Most words' attention networks exhibit small-world property and scale-free behavior; (3) Some networks generated by multilingual BERT can reflect typological information well, achieving preferable clustering performance among language groups; (4) In cross-layer analysis, the networks from 8 to 10 layers in Chinese BERT and from 6 to 9 layers in English BERT exhibit more consistent characteristics. Our study provides a comprehensive explanation of how PLMs organize language systems, which can be utilized to evaluate and develop improved models.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Addressing Extraction and Generation Separately: Keyphrase Prediction With Pre-Trained Language Models
    Liu, Rui
    Lin, Zheng
    Wang, Weiping
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3180 - 3191
  • [22] LMs go Phishing: Adapting Pre-trained Language Models to Detect Phishing Emails
    Misra, Kanishka
    Rayz, Julia Taylor
    2022 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT, 2022, : 135 - 142
  • [23] Unsupervised statistical text simplification using pre-trained language modeling for initialization
    Jipeng Qiang
    Feng Zhang
    Yun Li
    Yunhao Yuan
    Yi Zhu
    Xindong Wu
    Frontiers of Computer Science, 2023, 17
  • [24] Unsupervised statistical text simplification using pre-trained language modeling for initialization
    QIANG Jipeng
    ZHANG Feng
    LI Yun
    YUAN Yunhao
    ZHU Yi
    WU Xindong
    Frontiers of Computer Science, 2023, 17 (01)
  • [25] Unsupervised statistical text simplification using pre-trained language modeling for initialization
    Qiang, Jipeng
    Zhang, Feng
    Li, Yun
    Yuan, Yunhao
    Zhu, Yi
    Wu, Xindong
    FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (01)
  • [26] Aspect-Based Sentiment Analysis in Hindi Language by Ensembling Pre-Trained mBERT Models
    Pathak, Abhilash
    Kumar, Sudhanshu
    Roy, Partha Pratim
    Kim, Byung-Gyu
    ELECTRONICS, 2021, 10 (21)
  • [27] A Comparison of Pre-Trained Language Models for Multi-Class Text Classification in the Financial Domain
    Arslan, Yusuf
    Allix, Kevin
    Veiber, Lisa
    Lothritz, Cedric
    Bissyande, Tegawende F.
    Klein, Jacques
    Goujon, Anne
    WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 260 - 268
  • [28] Aspect-Based Sentiment Analysis of Social Media Data With Pre-Trained Language Models
    Troya, Anina
    Pillai, Reshmi Gopalakrishna
    Rivero, Cristian Rodriguez
    Genc, Zulkuf
    Kayal, Subhradeep
    Araci, Dogu
    2021 5TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2021, 2021, : 8 - 17
  • [29] Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks
    Ballout, Mohamad
    Krumnack, Ulf
    Heidemann, Gunther
    Kuehnberger, Kai-Uwe
    EXPLAINABLE ARTIFICIAL INTELLIGENCE, XAI 2023, PT III, 2023, 1903 : 3 - 25
  • [30] Disfluencies and Fine-Tuning Pre-trained Language Models for Detection of Alzheimer's Disease
    Yuan, Jiahong
    Bian, Yuchen
    Cai, Xingyu
    Huang, Jiaji
    Ye, Zheng
    Church, Kenneth
    INTERSPEECH 2020, 2020, : 2162 - 2166