What does the language system look like in pre-trained language models? A study using complex networks

被引:1
|
作者
Zheng, Jianyu [1 ]
机构
[1] Tsinghua Univ, Dept Chinese Language & Literature, Beijing 100084, Peoples R China
关键词
Language model; BERT; Complex network; Language system;
D O I
10.1016/j.knosys.2024.111984
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained language models has advanced the fields of natural language processing. The exceptional capabilities exhibited by PLMs in NLP tasks have been attracting researchers to explore the underlying factors responsible for their success. However, most of work primarily focus on studying some certain linguistic knowledge encoded in PLMs, rather than investigating how these models comprehend language from a holistic perspective. Furthermore, they cannot point out how PLMs organize the whole language system. Therefore, we adopt the complex network approach to represent the language system, and investigate how language elements are organized within the system. Specifically, we take the attention relationships among words as the research object, which are generated by attention heads within BERT models. Then, the words are treated as nodes, and the connections between words and their most-attending words are represented as edges. After obtaining these "words' attention networks", we analyze the network properties from various perspectives by calculating the network metrics. Many constructive conclusions are summarized, including: (1) The English attention networks demonstrate exceptional performance in organizing words; (2) Most words' attention networks exhibit small-world property and scale-free behavior; (3) Some networks generated by multilingual BERT can reflect typological information well, achieving preferable clustering performance among language groups; (4) In cross-layer analysis, the networks from 8 to 10 layers in Chinese BERT and from 6 to 9 layers in English BERT exhibit more consistent characteristics. Our study provides a comprehensive explanation of how PLMs organize language systems, which can be utilized to evaluate and develop improved models.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] A Study of Pre-trained Language Models in Natural Language Processing
    Duan, Jiajia
    Zhao, Hui
    Zhou, Qian
    Qiu, Meikang
    Liu, Meiqin
    2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121
  • [2] A Comparative Study of Using Pre-trained Language Models for Toxic Comment Classification
    Zhao, Zhixue
    Zhang, Ziqi
    Hopfgartner, Frank
    WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 500 - 507
  • [3] Pre-trained language models in medicine: A survey *
    Luo, Xudong
    Deng, Zhiqi
    Yang, Binxia
    Luo, Michael Y.
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
  • [4] Issue Report Classification Using Pre-trained Language Models
    Colavito, Giuseppe
    Lanubile, Filippo
    Novielli, Nicole
    2022 IEEE/ACM 1ST INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING (NLBSE 2022), 2022, : 29 - 32
  • [5] Pre-trained models for natural language processing: A survey
    Qiu XiPeng
    Sun TianXiang
    Xu YiGe
    Shao YunFan
    Dai Ning
    Huang XuanJing
    SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2020, 63 (10) : 1872 - 1897
  • [6] A complex network approach to analyse pre-trained language models for ancient Chinese
    Zheng, Jianyu
    Xiao, Xin'ge
    ROYAL SOCIETY OPEN SCIENCE, 2024, 11 (05):
  • [7] Comprehensive study of pre-trained language models: detecting humor in news headlines
    Farah Shatnawi
    Malak Abdullah
    Mahmoud Hammad
    Mahmoud Al-Ayyoub
    Soft Computing, 2023, 27 : 2575 - 2599
  • [8] The Impact of Training Methods on the Development of Pre-Trained Language Models
    Uribe, Diego
    Cuan, Enrique
    Urquizo, Elisa
    COMPUTACION Y SISTEMAS, 2024, 28 (01): : 109 - 124
  • [9] Comprehensive study of pre-trained language models: detecting humor in news headlines
    Shatnawi, Farah
    Abdullah, Malak
    Hammad, Mahmoud
    Al-Ayyoub, Mahmoud
    SOFT COMPUTING, 2023, 27 (05) : 2575 - 2599
  • [10] Quantifying Gender Bias in Arabic Pre-Trained Language Models
    Alrajhi, Wafa
    Al-Khalifa, Hend S.
    Al-Salman, Abdulmalik S.
    IEEE ACCESS, 2024, 12 : 77406 - 77420