What does the language system look like in pre-trained language models? A study using complex networks

被引：1

作者：

Zheng, Jianyu ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Chinese Language & Literature, Beijing 100084, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 299卷

关键词：

Language model; BERT; Complex network; Language system;

D O I：

10.1016/j.knosys.2024.111984

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained language models has advanced the fields of natural language processing. The exceptional capabilities exhibited by PLMs in NLP tasks have been attracting researchers to explore the underlying factors responsible for their success. However, most of work primarily focus on studying some certain linguistic knowledge encoded in PLMs, rather than investigating how these models comprehend language from a holistic perspective. Furthermore, they cannot point out how PLMs organize the whole language system. Therefore, we adopt the complex network approach to represent the language system, and investigate how language elements are organized within the system. Specifically, we take the attention relationships among words as the research object, which are generated by attention heads within BERT models. Then, the words are treated as nodes, and the connections between words and their most-attending words are represented as edges. After obtaining these "words' attention networks", we analyze the network properties from various perspectives by calculating the network metrics. Many constructive conclusions are summarized, including: (1) The English attention networks demonstrate exceptional performance in organizing words; (2) Most words' attention networks exhibit small-world property and scale-free behavior; (3) Some networks generated by multilingual BERT can reflect typological information well, achieving preferable clustering performance among language groups; (4) In cross-layer analysis, the networks from 8 to 10 layers in Chinese BERT and from 6 to 9 layers in English BERT exhibit more consistent characteristics. Our study provides a comprehensive explanation of how PLMs organize language systems, which can be utilized to evaluate and develop improved models.

引用

页数：11

共 50 条

[41] English-Assamese neural machine translation using prior alignment and pre-trained language model
Laskar, Sahinur Rahman
Paul, Bishwaraj
Dadure, Pankaj
Manna, Riyanka
Pakray, Partha
Bandyopadhyay, Sivaji
COMPUTER SPEECH AND LANGUAGE, 2023, 82
[42] Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code
Tagarelli, Andrea
Simeri, Andrea
ARTIFICIAL INTELLIGENCE AND LAW, 2022, 30 (03) : 417 - 473
[43] Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code
Andrea Tagarelli
Andrea Simeri
Artificial Intelligence and Law, 2022, 30 : 417 - 473
[44] Learning to Predict US Policy Change Using New York Times Corpus with Pre-Trained Language Model
Zhang, Guoshuai
Wu, Jiaji
Tan, Mingzhou
Yang, Zhongjie
Cheng, Qingyu
Han, Hong
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 34227 - 34240
[45] Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
Christian, Hans
Suhartono, Derwin
Chowanda, Andry
Zamli, Kamal Z.
JOURNAL OF BIG DATA, 2021, 8 (01)
[46] Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
Hans Christian
Derwin Suhartono
Andry Chowanda
Kamal Z. Zamli
Journal of Big Data, 8
[47] Framing and BERTology: A Data-Centric Approach to Integration of Linguistic Features into Transformer-Based Pre-trained Language Models
Avetisyan, Hayastan
Safikhani, Parisa
Broneske, David
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 4, INTELLISYS 2023, 2024, 825 : 81 - 90
[48] Incorporation of company-related factual knowledge into pre-trained language models for stock-related spam tweet filtering
Park, Jihye
Cho, Sungzoon
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
[49] Puer at SemEval-2024 Task 4: Fine-tuning Pre-trained Language Models for Meme Persuasion Technique Detection
Dao, Jiaxu
Li, Zhuoying
Su, Youbang
Gong, Wensheng
PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 64 - 69
[50] FedITD: A Federated Parameter-Efficient Tuning With Pre-Trained Large Language Models and Transfer Learning Framework for Insider Threat Detection
Wang, Zhi Qiang
Wang, Haopeng
El Saddik, Abdulmotaleb
IEEE ACCESS, 2024, 12 : 160396 - 160417

← 1 2 3 4 5 →