What does the language system look like in pre-trained language models? A study using complex networks

被引：1

作者：

Zheng, Jianyu ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Chinese Language & Literature, Beijing 100084, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 299卷

关键词：

Language model; BERT; Complex network; Language system;

D O I：

10.1016/j.knosys.2024.111984

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained language models has advanced the fields of natural language processing. The exceptional capabilities exhibited by PLMs in NLP tasks have been attracting researchers to explore the underlying factors responsible for their success. However, most of work primarily focus on studying some certain linguistic knowledge encoded in PLMs, rather than investigating how these models comprehend language from a holistic perspective. Furthermore, they cannot point out how PLMs organize the whole language system. Therefore, we adopt the complex network approach to represent the language system, and investigate how language elements are organized within the system. Specifically, we take the attention relationships among words as the research object, which are generated by attention heads within BERT models. Then, the words are treated as nodes, and the connections between words and their most-attending words are represented as edges. After obtaining these "words' attention networks", we analyze the network properties from various perspectives by calculating the network metrics. Many constructive conclusions are summarized, including: (1) The English attention networks demonstrate exceptional performance in organizing words; (2) Most words' attention networks exhibit small-world property and scale-free behavior; (3) Some networks generated by multilingual BERT can reflect typological information well, achieving preferable clustering performance among language groups; (4) In cross-layer analysis, the networks from 8 to 10 layers in Chinese BERT and from 6 to 9 layers in English BERT exhibit more consistent characteristics. Our study provides a comprehensive explanation of how PLMs organize language systems, which can be utilized to evaluate and develop improved models.

引用

页数：11

共 50 条

[21] Addressing Extraction and Generation Separately: Keyphrase Prediction With Pre-Trained Language Models
Liu, Rui
Lin, Zheng
Wang, Weiping
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3180 - 3191
[22] LMs go Phishing: Adapting Pre-trained Language Models to Detect Phishing Emails
Misra, Kanishka
Rayz, Julia Taylor
2022 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT, 2022, : 135 - 142
[23] Unsupervised statistical text simplification using pre-trained language modeling for initialization
Jipeng Qiang
Feng Zhang
Yun Li
Yunhao Yuan
Yi Zhu
Xindong Wu
Frontiers of Computer Science, 2023, 17
[24] Unsupervised statistical text simplification using pre-trained language modeling for initialization
QIANG Jipeng
ZHANG Feng
LI Yun
YUAN Yunhao
ZHU Yi
WU Xindong
Frontiers of Computer Science, 2023, 17 (01)
[25] Unsupervised statistical text simplification using pre-trained language modeling for initialization
Qiang, Jipeng
Zhang, Feng
Li, Yun
Yuan, Yunhao
Zhu, Yi
Wu, Xindong
FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (01)
[26] Aspect-Based Sentiment Analysis in Hindi Language by Ensembling Pre-Trained mBERT Models
Pathak, Abhilash
Kumar, Sudhanshu
Roy, Partha Pratim
Kim, Byung-Gyu
ELECTRONICS, 2021, 10 (21)
[27] A Comparison of Pre-Trained Language Models for Multi-Class Text Classification in the Financial Domain
Arslan, Yusuf
Allix, Kevin
Veiber, Lisa
Lothritz, Cedric
Bissyande, Tegawende F.
Klein, Jacques
Goujon, Anne
WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 260 - 268
[28] Aspect-Based Sentiment Analysis of Social Media Data With Pre-Trained Language Models
Troya, Anina
Pillai, Reshmi Gopalakrishna
Rivero, Cristian Rodriguez
Genc, Zulkuf
Kayal, Subhradeep
Araci, Dogu
2021 5TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2021, 2021, : 8 - 17
[29] Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks
Ballout, Mohamad
Krumnack, Ulf
Heidemann, Gunther
Kuehnberger, Kai-Uwe
EXPLAINABLE ARTIFICIAL INTELLIGENCE, XAI 2023, PT III, 2023, 1903 : 3 - 25
[30] Disfluencies and Fine-Tuning Pre-trained Language Models for Detection of Alzheimer's Disease
Yuan, Jiahong
Bian, Yuchen
Cai, Xingyu
Huang, Jiaji
Ye, Zheng
Church, Kenneth
INTERSPEECH 2020, 2020, : 2162 - 2166

← 1 2 3 4 5 →