Learning Word Representations from Scarce and Noisy Data with Embedding Sub-spaces

被引:0
|
作者
Astudillo, Ramon F. [1 ]
Amir, Silvio [1 ]
Lin, Wang [1 ]
Silva, Mario [1 ]
Trancoso, Isabel [1 ]
机构
[1] Inst Engn Sistemas & Comp Invest & Desenvolviment, Rua Alves Redol 9, Lisbon, Portugal
来源
PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 | 2015年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate a technique to adapt unsupervised word embeddings to specific applications, when only small and noisy labeled datasets are available. Current methods use pre-trained embeddings to initialize model parameters, and then use the labeled data to tailor them for the intended task. However, this approach is prone to overfitting when the training is performed with scarce and noisy data. To overcome this issue, we use the supervised data to find an embedding subspace that fits the task complexity. All the word representations are adapted through a projection into this task-specific subspace, even if they do not occur on the labeled dataset. This approach was recently used in the SemEval 2015 Twitter sentiment analysis challenge, attaining state-of-the-art results. Here we show results improving those of the challenge, as well as additional experiments in a Twitter Part-Of-Speech tagging task.
引用
收藏
页码:1074 / 1084
页数:11
相关论文
共 50 条
  • [21] Learning programs from noisy data
    Raychev V.
    Bielik P.
    Vechev M.
    Krause A.
    1600, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (51): : 761 - 774
  • [22] Gaussian processes meet NeuralODEs: a Bayesian framework for learning the dynamics of partially observed systems from scarce and noisy data
    Bhouri, Mohamed Aziz
    Perdikaris, Paris
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2022, 380 (2229):
  • [23] Learning distributed representations of relational data using linear relational embedding
    Paccanaro, A
    Hinton, GE
    NEURAL NETS WIRN VIETRI-01, 2002, : 134 - 143
  • [24] Learning multi-prototype word embedding from single-prototype word embedding with integrated knowledge
    Yang, Xuefeng
    Mao, Kezhi
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 56 : 291 - 299
  • [25] Learning from Noisy Data with Robust Representation Learning
    Li, Junnan
    Xiong, Caiming
    Hoi, Steven C. H.
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9465 - 9474
  • [26] Predictive Student Modeling in Game-Based Learning Environments with Word Embedding Representations of Reflection
    Geden, Michael
    Emerson, Andrew
    Carpenter, Dan
    Rowe, Jonathan
    Azevedo, Roger
    Lester, James
    INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION, 2021, 31 (01) : 1 - 23
  • [27] Predictive Student Modeling in Game-Based Learning Environments with Word Embedding Representations of Reflection
    Michael Geden
    Andrew Emerson
    Dan Carpenter
    Jonathan Rowe
    Roger Azevedo
    James Lester
    International Journal of Artificial Intelligence in Education, 2021, 31 : 1 - 23
  • [28] Learning Explanatory Rules from Noisy Data
    Evans, Richard
    Grefenstette, Edward
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 : 1 - 64
  • [29] Robust Graph Learning From Noisy Data
    Kang, Zhao
    Pan, Haiqi
    Hoi, Steven C. H.
    Xu, Zenglin
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (05) : 1833 - 1843
  • [30] Learning explanatory rules from noisy data
    1600, AI Access Foundation (61):