A Study on the Relevance of Generic Word Embeddings for Sentence Classification in Hepatic Surgery

被引:0
|
作者
Oukelmoun, Achir [1 ,2 ]
Semmar, Nasredine [1 ]
de Chalendar, Gael [1 ]
Habran, Enguerrand [2 ]
Vibert, Eric [2 ]
Goblet, Emma [2 ]
Oukelmoun, Mariame [3 ]
Allard, Marc-Antoine [2 ]
机构
[1] Univ Paris Saclay, CEA, List, F-91120 Palaiseau, France
[2] Chaire BOPA, Rue Chapelle Hop, F-94800 Villejuif, France
[3] Ctr Hosp Cheikh Zaid, Rabat 10000, Morocco
来源
2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA | 2023年
关键词
Natural Language Processing; Word embeddings; Gradient Boosting; hepatic; surgery; transformers; classifiers; supervised learning;
D O I
10.1109/AICCSA59173.2023.10479342
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While the fine-tuning process of extensive contextual language models often demands substantial computational capacity, utilizing generic pre-trained models in highly specialized domains can yield suboptimal results. This paper aims to explore an innovative approach to derive pertinent word embeddings tailored to a specific domain with limited computational resources (The introduced methodologies are tested within the domain of hepatic surgery, utilizing the French language.). This exploration takes place within a context where computational limitations prohibit the fine-tuning of large language models. A new embedding (referred to as FTW2V) that combines Word2Vec and FastText is introduced. This approach addresses the challenge of incorporating terms absent from Word2Vec's vocabulary. Furthermore, a novel method is used to evaluate the significance of word embeddings within a specialized corpus. This evaluation involves comparing classification scores distributions of classifiers (Gradient Boosting) trained on word embeddings derived from benchmarked Natural Language Processing (NLP) models. As per this assessment technique, the FTW2V model, trained from scratch with limited computational resources, outperforms generic contextual models in terms of word embeddings quality. Additionally, a computationally efficient contextual model rooted in FTW2V is introduced. This modified model substitutes Gradient Boosting with a transformer and integrates Part Of Speech labels.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] On Character vs Word Embeddings as Input for English Sentence Classification
    Hammerton, James
    Vintro, Merce
    Kapetanakis, Stelios
    Sama, Michele
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 550 - 566
  • [2] Language with vision: A study on grounded word and sentence embeddings
    Shahmohammadi, Hassan
    Heitmeier, Maria
    Shafaei-Bajestan, Elnaz
    Lensch, Hendrik P. A.
    Baayen, R. Harald
    BEHAVIOR RESEARCH METHODS, 2024, 56 (06) : 5622 - 5646
  • [3] Improving Implicit Stance Classification in Tweets Using Word and Sentence Embeddings
    Schaefer, Robin
    Stede, Manfred
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2019, 2019, 11793 : 299 - 307
  • [4] Carrier Sentence Selection with Word and Context Embeddings
    Yeung, Chak Yan
    Lee, John
    Tsou, Benjamin
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 439 - 444
  • [5] Text classification by untrained sentence embeddings
    Di Sarli, Daniele
    Gallicchio, Claudio
    Micheli, Alessio
    INTELLIGENZA ARTIFICIALE, 2020, 14 (02) : 245 - 259
  • [6] Single document summarization using word and sentence embeddings
    Ayana
    PROCEEDINGS OF THE 2015 JOINT INTERNATIONAL MECHANICAL, ELECTRONIC AND INFORMATION TECHNOLOGY CONFERENCE (JIMET 2015), 2015, 10 : 523 - 526
  • [7] Capturing Word Order in Averaging Based Sentence Embeddings
    Lee, Jae Hee
    Camacho-Collados, Jose
    Anke, Luis Espinosa
    Schockaert, Steven
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2062 - 2069
  • [8] Siamese CBOW: Optimizing Word Embeddings for Sentence Representations
    Kenter, Tom
    Borisov, Alexey
    de Rijke, Maarten
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 941 - 951
  • [9] A Neural Model for Compositional Word Embeddings and Sentence Processing
    Bernardy, Jean-Philippe
    Lappin, Shalom
    PROCEEDINGS OF THE WORKSHOP ON COGNITIVE MODELING AND COMPUTATIONAL LINGUISTICS (CMCL 2022), 2022, : 12 - 22
  • [10] INVESTIGATING THE EFFECTS OF WORD SUBSTITUTION ERRORS ON SENTENCE EMBEDDINGS
    Voleti, Rohit
    Liss, Julie M.
    Berisha, Visar
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7315 - 7319