RS-BERT: Pre-training radical enhanced sense embedding for Chinese word sense disambiguation

被引：3

作者：

Zhou, Xiaofeng ^{[1
,2
,3
]}

Huang, Heyan ^{[1
,2
,3
]}

Chi, Zewen ^{[1
,2
,3
]}

Ren, Mucheng ^{[1
,2
,3
]}

Gao, Yang ^{[1
,2
,3
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China

[2] Beijing Inst Technol, Southeast Acad Informat Technol, Putian 351100, Fujian, Peoples R China

[3] Beijing Engn Res Ctr High Volume Language Informat, Beijing, Peoples R China

来源：

INFORMATION PROCESSING & MANAGEMENT | 2024年 / 61卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Sense embeddings; Language models; Word sense disambiguation;

D O I：

10.1016/j.ipm.2024.103740

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Word sense disambiguation is a crucial task to test whether a model can perform deep understanding. Nowadays, the rise of pre -trained language models facilitates substantial success in such tasks. However, most current pre -training tasks are taken in place of the token level while ignoring the linguistic sense of the tokens themselves. Thus it is questionable whether the token -predicting objectives are enough to learn the polysemy and disambiguate senses. To explore this question, we introduce RS -BERT, a radical enhanced sense embedding model with a novel pre -training objective, sense -aware language modeling, which introduces additional sense -level information to the model. For each training step, we first predict the senses and then update the model given the predicted senses. During training, we alternately perform the above two steps in an expectation-maximization manner. Besides, we also introduce radical information to RS -BERT at the beginning of pre -training. We conduct experiments on two Chinese word sense disambiguation datasets. Experimental results show that RS -BERT is competitive. When combined with other dedicated adaptions for specific datasets, RS -BERT shows impressive performance. Moreover, our analysis shows that RS -BERT successfully clusters Chinese characters into various senses. The experiment results demonstrate that the tokenpredicting objectives are not enough and the sense -level objective performs better for polysemy and sense disambiguation.

引用

页数：14

共 58 条

[1] A novel word sense disambiguation approach using WordNet knowledge graph [J].

AlMousa, Mohannad ;

Benlamri, Rachid ;

Khoury, Richard .

COMPUTER SPEECH AND LANGUAGE, 2022, 74

[2]

[Anonymous], 2016, P 2016 C EMP METH NA

[3]

Ansell A, 2021, 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), P563

[4]

Bai H, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P1352

[5]

Barba E, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P4661

[6]

Bengio Y, 2009, INT C MACHINE LEARNI, P41, DOI [DOI 10.1145/1553374.1553380, 10.1145/1553374.1553380]

[7]

Black S, 2022, PROCEEDINGS OF WORKSHOP ON CHALLENGES & PERSPECTIVES IN CREATING LARGE LANGUAGE MODELS (BIGSCIENCE EPISODE #5), P95

[8] A SET OF POSTULATES FOR THE SCIENCE OF LANGUAGE [J].

Bloomfield, Leonard .

LANGUAGE, 1926, 2 (03) :153-164

[9]

Brown P. F., 1992, Computational Linguistics, V18, P467

[10]

Brown TB, 2020, ADV NEUR IN, V33

← 1 2 3 4 5 6 →