What does the language system look like in pre-trained language models? A study using complex networks

被引：1

作者：

Zheng, Jianyu ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Chinese Language & Literature, Beijing 100084, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 299卷

关键词：

Language model; BERT; Complex network; Language system;

D O I：

10.1016/j.knosys.2024.111984

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained language models has advanced the fields of natural language processing. The exceptional capabilities exhibited by PLMs in NLP tasks have been attracting researchers to explore the underlying factors responsible for their success. However, most of work primarily focus on studying some certain linguistic knowledge encoded in PLMs, rather than investigating how these models comprehend language from a holistic perspective. Furthermore, they cannot point out how PLMs organize the whole language system. Therefore, we adopt the complex network approach to represent the language system, and investigate how language elements are organized within the system. Specifically, we take the attention relationships among words as the research object, which are generated by attention heads within BERT models. Then, the words are treated as nodes, and the connections between words and their most-attending words are represented as edges. After obtaining these "words' attention networks", we analyze the network properties from various perspectives by calculating the network metrics. Many constructive conclusions are summarized, including: (1) The English attention networks demonstrate exceptional performance in organizing words; (2) Most words' attention networks exhibit small-world property and scale-free behavior; (3) Some networks generated by multilingual BERT can reflect typological information well, achieving preferable clustering performance among language groups; (4) In cross-layer analysis, the networks from 8 to 10 layers in Chinese BERT and from 6 to 9 layers in English BERT exhibit more consistent characteristics. Our study provides a comprehensive explanation of how PLMs organize language systems, which can be utilized to evaluate and develop improved models.

引用

页数：11

共 50 条

[1] A Study of Pre-trained Language Models in Natural Language Processing
Duan, Jiajia
Zhao, Hui
Zhou, Qian
Qiu, Meikang
Liu, Meiqin
2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121
[2] A Comparative Study of Using Pre-trained Language Models for Toxic Comment Classification
Zhao, Zhixue
Zhang, Ziqi
Hopfgartner, Frank
WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 500 - 507
[3] Pre-trained language models in medicine: A survey *
Luo, Xudong
Deng, Zhiqi
Yang, Binxia
Luo, Michael Y.
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
[4] Issue Report Classification Using Pre-trained Language Models
Colavito, Giuseppe
Lanubile, Filippo
Novielli, Nicole
2022 IEEE/ACM 1ST INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING (NLBSE 2022), 2022, : 29 - 32
[5] Pre-trained models for natural language processing: A survey
Qiu XiPeng
Sun TianXiang
Xu YiGe
Shao YunFan
Dai Ning
Huang XuanJing
SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2020, 63 (10) : 1872 - 1897
[6] A complex network approach to analyse pre-trained language models for ancient Chinese
Zheng, Jianyu
Xiao, Xin'ge
ROYAL SOCIETY OPEN SCIENCE, 2024, 11 (05):
[7] Comprehensive study of pre-trained language models: detecting humor in news headlines
Farah Shatnawi
Malak Abdullah
Mahmoud Hammad
Mahmoud Al-Ayyoub
Soft Computing, 2023, 27 : 2575 - 2599
[8] The Impact of Training Methods on the Development of Pre-Trained Language Models
Uribe, Diego
Cuan, Enrique
Urquizo, Elisa
COMPUTACION Y SISTEMAS, 2024, 28 (01): : 109 - 124
[9] Comprehensive study of pre-trained language models: detecting humor in news headlines
Shatnawi, Farah
Abdullah, Malak
Hammad, Mahmoud
Al-Ayyoub, Mahmoud
SOFT COMPUTING, 2023, 27 (05) : 2575 - 2599
[10] Quantifying Gender Bias in Arabic Pre-Trained Language Models
Alrajhi, Wafa
Al-Khalifa, Hend S.
Al-Salman, Abdulmalik S.
IEEE ACCESS, 2024, 12 : 77406 - 77420

← 1 2 3 4 5 →