Sentence Selection Strategies for Distilling Word Embeddings from BERT

被引:0
|
作者
Wang, Yixiao [1 ]
Bouraoui, Zied [2 ]
Espinosa-Anke, Luis [1 ]
Schockaert, Steven [1 ]
机构
[1] Cardiff Univ, Cardiff, S Glam, Wales
[2] Univ Artois, CNRS, CRIL, Arras, France
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
基金
英国工程与自然科学研究理事会;
关键词
Word Embeddings; Language Models; Natural Language Processing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Many applications crucially rely on the availability of high-quality word vectors. To learn such representations, several strategies based on language models have been proposed in recent years. While effective, these methods typically rely on a large number of contextualised vectors for each word, which makes them impractical. In this paper, we investigate whether similar results can be obtained when only a few contextualised representations of each word can be used. To this end, we analyze a range of strategies for selecting the most informative sentences. Our results show that with a careful selection strategy, high-quality word vectors can be learned from as few as 5 to 10 sentences.
引用
收藏
页码:2591 / 2600
页数:10
相关论文
共 50 条
  • [31] Unsupervised Domain Adaptation for Sentimental Classification by Word Embeddings on the Lower Layer of BERT
    Bai, Jing
    Tanaka, Hirotaka
    Cao, Rui
    Ma, Wen
    Shinnou, Hiroyuki
    2019 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2019,
  • [32] Word Embeddings-based Sentence-Level Sentiment Analysis considering Word Importance
    Hayashi, Toshitaka
    Fujita, Hamido
    ACTA POLYTECHNICA HUNGARICA, 2019, 16 (07) : 7 - 24
  • [33] SENTENCE INTELLIGIBILITY AS A FUNCTION OF KEY WORD SELECTION
    DUFFY, JR
    GIOLAS, TG
    JOURNAL OF SPEECH AND HEARING RESEARCH, 1974, 17 (04): : 631 - 637
  • [34] Document Summarization Using Sentence-Level Semantic Based on Word Embeddings
    Al-Sabahi, Kamal
    Zhang Zuping
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2019, 29 (02) : 177 - 196
  • [35] Regressing Word and Sentence Embeddings for Low-Resource Neural Machine Translation
    Unanue I.J.
    Borzeshi E.Z.
    Piccardi M.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (03): : 450 - 463
  • [36] Comparing the Performance of Neural and Statistical Sentence Embeddings on Summarization and Word Sense Disambiguation
    Juvekar, Gaurav
    Lolage, Abhishek
    Sahasrabudhe, Dhruva
    Haribhakta, Yashodhara
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1787 - 1792
  • [37] Contrastive Learning of Sentence Embeddings from Scratch
    Zhang, Junlei
    Lan, Zhenzhong
    He, Junxian
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3916 - 3932
  • [38] Feature selection based on word-sentence relation
    Schönhofen, P
    Benczúr, AA
    ICMLA 2005: FOURTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2005, : 37 - 42
  • [39] Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration
    Wang, Shufan
    Thompson, Laure
    Iyyer, Mohit
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 10837 - 10851
  • [40] Word and Word Order: From Word and Collocation to Sentence and Word Order
    Kacala, Jan
    ESLAVISTICA COMPLUTENSE, 2012, 12 : 87 - 95