Learning Word Representations from Scarce and Noisy Data with Embedding Sub-spaces

被引:0
|
作者
Astudillo, Ramon F. [1 ]
Amir, Silvio [1 ]
Lin, Wang [1 ]
Silva, Mario [1 ]
Trancoso, Isabel [1 ]
机构
[1] Inst Engn Sistemas & Comp Invest & Desenvolviment, Rua Alves Redol 9, Lisbon, Portugal
来源
PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 | 2015年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate a technique to adapt unsupervised word embeddings to specific applications, when only small and noisy labeled datasets are available. Current methods use pre-trained embeddings to initialize model parameters, and then use the labeled data to tailor them for the intended task. However, this approach is prone to overfitting when the training is performed with scarce and noisy data. To overcome this issue, we use the supervised data to find an embedding subspace that fits the task complexity. All the word representations are adapted through a projection into this task-specific subspace, even if they do not occur on the labeled dataset. This approach was recently used in the SemEval 2015 Twitter sentiment analysis challenge, attaining state-of-the-art results. Here we show results improving those of the challenge, as well as additional experiments in a Twitter Part-Of-Speech tagging task.
引用
收藏
页码:1074 / 1084
页数:11
相关论文
共 50 条
  • [41] Efficient Graph Learning From Noisy and Incomplete Data
    Berger, Peter
    Hannak, Gabor
    Matz, Gerald
    IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2020, 6 : 105 - 119
  • [42] Emerging Topics in Learning from Noisy and Missing Data
    Alameda-Pineda, Xavier
    Hospedales, Timothy M.
    Ricci, Elisa
    Sebe, Nicu
    Wang, Xiaogang
    MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 1469 - 1470
  • [43] Learning from Noisy Pairwise Similarity and Unlabeled Data
    Wu, Songhua
    Liu, Tongliang
    Han, Bo
    Yu, Jun
    Niu, Gang
    Sugiyama, Masashi
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [44] Reinforcement Learning for Relation Classification from Noisy Data
    Feng, Jun
    Huang, Minlie
    Zhao, Li
    Yang, Yang
    Zhu, Xiaoyan
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5779 - 5786
  • [45] Learning MDL Logic Programs from Noisy Data
    Hocquette, Celine
    Niskanen, Andreas
    Jarvisalo, Matti
    Cropper, Andrew
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10553 - 10561
  • [46] Guest Editorial Learning From Noisy Multimedia Data
    Zhang, Jian
    Hanjalic, Alan
    Jain, Ramesh
    Hua, Xiansheng
    Satoh, Shin'ichi
    Yao, Yazhou
    Zeng, Dan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1247 - 1252
  • [47] Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data
    Fang, Anjie
    Macdonald, Craig
    Ounis, Iadh
    Habel, Philip
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 1057 - 1060
  • [48] An algorithm of wavelet network learning from noisy data
    Zhang, Zhiguo
    San, Ye
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 2746 - +
  • [49] Tourism Recommendation based on Word Embedding from Card Transaction Data
    Hong, Minsung
    Chung, Namho
    Koo, Chulmo
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2023, 20 (03) : 911 - 931
  • [50] Physics-informed learning of governing equations from scarce data
    Zhao Chen
    Yang Liu
    Hao Sun
    Nature Communications, 12