A Study on the Relevance of Generic Word Embeddings for Sentence Classification in Hepatic Surgery

被引：0

作者：

Oukelmoun, Achir ^{[1
,2
]}

Semmar, Nasredine ^{[1
]}

de Chalendar, Gael ^{[1
]}

Habran, Enguerrand ^{[2
]}

Vibert, Eric ^{[2
]}

Goblet, Emma ^{[2
]}

Oukelmoun, Mariame ^{[3
]}

Allard, Marc-Antoine ^{[2
]}

机构：

[1] Univ Paris Saclay, CEA, List, F-91120 Palaiseau, France

[2] Chaire BOPA, Rue Chapelle Hop, F-94800 Villejuif, France

[3] Ctr Hosp Cheikh Zaid, Rabat 10000, Morocco

来源：

2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA | 2023年

关键词：

Natural Language Processing; Word embeddings; Gradient Boosting; hepatic; surgery; transformers; classifiers; supervised learning;

D O I：

10.1109/AICCSA59173.2023.10479342

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While the fine-tuning process of extensive contextual language models often demands substantial computational capacity, utilizing generic pre-trained models in highly specialized domains can yield suboptimal results. This paper aims to explore an innovative approach to derive pertinent word embeddings tailored to a specific domain with limited computational resources (The introduced methodologies are tested within the domain of hepatic surgery, utilizing the French language.). This exploration takes place within a context where computational limitations prohibit the fine-tuning of large language models. A new embedding (referred to as FTW2V) that combines Word2Vec and FastText is introduced. This approach addresses the challenge of incorporating terms absent from Word2Vec's vocabulary. Furthermore, a novel method is used to evaluate the significance of word embeddings within a specialized corpus. This evaluation involves comparing classification scores distributions of classifiers (Gradient Boosting) trained on word embeddings derived from benchmarked Natural Language Processing (NLP) models. As per this assessment technique, the FTW2V model, trained from scratch with limited computational resources, outperforms generic contextual models in terms of word embeddings quality. Additionally, a computationally efficient contextual model rooted in FTW2V is introduced. This modified model substitutes Gradient Boosting with a transformer and integrates Part Of Speech labels.

引用

页数：8

共 50 条

[1] On Character vs Word Embeddings as Input for English Sentence Classification
Hammerton, James
Vintro, Merce
Kapetanakis, Stelios
Sama, Michele
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 550 - 566
[2] Language with vision: A study on grounded word and sentence embeddings
Shahmohammadi, Hassan
Heitmeier, Maria
Shafaei-Bajestan, Elnaz
Lensch, Hendrik P. A.
Baayen, R. Harald
BEHAVIOR RESEARCH METHODS, 2024, 56 (06) : 5622 - 5646
[3] Improving Implicit Stance Classification in Tweets Using Word and Sentence Embeddings
Schaefer, Robin
Stede, Manfred
ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2019, 2019, 11793 : 299 - 307
[4] Carrier Sentence Selection with Word and Context Embeddings
Yeung, Chak Yan
Lee, John
Tsou, Benjamin
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 439 - 444
[5] Text classification by untrained sentence embeddings
Di Sarli, Daniele
Gallicchio, Claudio
Micheli, Alessio
INTELLIGENZA ARTIFICIALE, 2020, 14 (02) : 245 - 259
[6] Single document summarization using word and sentence embeddings
Ayana
PROCEEDINGS OF THE 2015 JOINT INTERNATIONAL MECHANICAL, ELECTRONIC AND INFORMATION TECHNOLOGY CONFERENCE (JIMET 2015), 2015, 10 : 523 - 526
[7] Capturing Word Order in Averaging Based Sentence Embeddings
Lee, Jae Hee
Camacho-Collados, Jose
Anke, Luis Espinosa
Schockaert, Steven
ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2062 - 2069
[8] Siamese CBOW: Optimizing Word Embeddings for Sentence Representations
Kenter, Tom
Borisov, Alexey
de Rijke, Maarten
PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 941 - 951
[9] A Neural Model for Compositional Word Embeddings and Sentence Processing
Bernardy, Jean-Philippe
Lappin, Shalom
PROCEEDINGS OF THE WORKSHOP ON COGNITIVE MODELING AND COMPUTATIONAL LINGUISTICS (CMCL 2022), 2022, : 12 - 22
[10] INVESTIGATING THE EFFECTS OF WORD SUBSTITUTION ERRORS ON SENTENCE EMBEDDINGS
Voleti, Rohit
Liss, Julie M.
Berisha, Visar
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7315 - 7319

← 1 2 3 4 5 →