Learning Word Representations from Scarce and Noisy Data with Embedding Sub-spaces

被引:0
|
作者
Astudillo, Ramon F. [1 ]
Amir, Silvio [1 ]
Lin, Wang [1 ]
Silva, Mario [1 ]
Trancoso, Isabel [1 ]
机构
[1] Inst Engn Sistemas & Comp Invest & Desenvolviment, Rua Alves Redol 9, Lisbon, Portugal
来源
PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 | 2015年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate a technique to adapt unsupervised word embeddings to specific applications, when only small and noisy labeled datasets are available. Current methods use pre-trained embeddings to initialize model parameters, and then use the labeled data to tailor them for the intended task. However, this approach is prone to overfitting when the training is performed with scarce and noisy data. To overcome this issue, we use the supervised data to find an embedding subspace that fits the task complexity. All the word representations are adapted through a projection into this task-specific subspace, even if they do not occur on the labeled dataset. This approach was recently used in the SemEval 2015 Twitter sentiment analysis challenge, attaining state-of-the-art results. Here we show results improving those of the challenge, as well as additional experiments in a Twitter Part-Of-Speech tagging task.
引用
收藏
页码:1074 / 1084
页数:11
相关论文
共 50 条
  • [1] Multi-Label learning in the independent label sub-spaces
    Barezi, Elham J.
    Kwok, James T.
    Rabiee, Hamid R.
    PATTERN RECOGNITION LETTERS, 2017, 97 : 8 - 12
  • [2] Learning Clustered Sub-spaces for Sketch-based Image Retrieval
    Ghosal, Koustav
    Prabhu, Ameya
    Dasgupta, Riddhiman
    Namboodiri, Anoop M.
    PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 599 - 603
  • [3] An algorithm for learning representations of models with scarce data
    de Wynter, Adrian
    INFORMATION GEOMETRY, 2024, 7 (02) : 489 - 521
  • [4] Learning Multiple Non-Linear Sub-Spaces using K-RBMs
    Chandra, Siddhartha
    Kumar, Shailesh
    Jawahar, C. V.
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 2778 - 2785
  • [5] Chinese Word Embedding Learning with Limited Data
    Chen, Shurui
    Chen, Yufu
    Lu, Yuyin
    Rao, Yanghui
    Xie, Haoran
    Li, Qing
    WEB AND BIG DATA, APWEB-WAIM 2021, PT I, 2021, 12858 : 211 - 226
  • [6] Learning Multimodal Word Representations by Explicitly Embedding Syntactic and Phonetic Information
    Zhu, Wenhao
    Liu, Shuang
    Liu, Chaoming
    Yin, Xiaoya
    Xv, Xiaping
    IEEE ACCESS, 2020, 8 : 223306 - 223315
  • [7] Learning Multimodal Word Representations by Explicitly Embedding Syntactic and Phonetic Information
    Zhu, Wenhao
    Liu, Shuang
    Liu, Chaoming
    Yin, Xiaoya
    Xv, Xiaping
    IEEE Access, 2020, 8 : 223306 - 223315
  • [8] Local ensemble learning from imbalanced and noisy data for word sense disambiguation
    Krawczyk, Bartosz
    McInnes, Bridget T.
    PATTERN RECOGNITION, 2018, 78 : 103 - 119
  • [9] Multi-label learning in the independent label sub-spaces (vol 97, pg 8, 2017)
    Barezi, Elham J.
    Kwok, James T.
    Rabiee, Hamid R.
    PATTERN RECOGNITION LETTERS, 2018, 112 : 152 - 152
  • [10] Learning Fuzzy Set Representations of Partial Shapes on Dual Embedding Spaces
    Sung, Minhyuk
    Dubrovina, Anastasia
    Kim, Vladimir G.
    Guibas, Leonidas
    COMPUTER GRAPHICS FORUM, 2018, 37 (05) : 71 - 81