Distinguishability Calibration to In-Context Learning

被引:0
作者
Li, Hongjing [1 ]
Yan, Hanqi [1 ]
Li, Yanran
Qian, Li [2 ]
He, Yulan [1 ,3 ,4 ]
Gui, Lin [3 ]
机构
[1] Univ Warwick, Dept Comp Sci, Coventry, W Midlands, England
[2] Xiaomi AI Lab, Beijing, Peoples R China
[3] Kings Coll London, Dept Informat, London, England
[4] Alan Turing Inst, London, England
来源
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023 | 2023年
基金
英国科研创新办公室; 英国工程与自然科学研究理事会; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have witnessed increasing interests in prompt-based learning in which models can be trained on only a few annotated instances, making them suitable in low-resource settings. When using prompt-based learning for text classification, the goal is to use a pretrained language model (PLM) to predict a missing token in a pre-defined template given an input text, which can be mapped to a class label. However, PLMs built on the transformer architecture tend to generate similar output embeddings, making it difficult to discriminate between different class labels. The problem is further exacerbated when dealing with classification tasks involving many fine-grained class labels. In this work, we alleviate this information diffusion issue, i.e., different tokens share a large proportion of similar information after going through stacked multiple self-attention layers in a transformer, by proposing a calibration method built on feature transformations through rotation and scaling to map a PLM-encoded embedding into a new metric space to guarantee the distinguishability of the resulting embeddings. Furthermore, we take the advantage of hyperbolic embeddings to capture the hierarchical relations among fine-grained classassociated token embedding by a coarse-to-fine metric learning strategy to enhance the distinguishability of the learned output embeddings. Extensive experiments on the three datasets under various settings demonstrate the effectiveness of our approach. (1)
引用
收藏
页码:1385 / 1397
页数:13
相关论文
共 33 条
  • [1] PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen Domains
    Ben-David, Eyal
    Oved, Nadav
    Reichart, Roi
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 414 - 433
  • [2] Brown TB, 2020, ADV NEUR IN, V33
  • [3] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [4] Gao Jun, 2019, 7 INT C LEARNING REP
  • [5] Gao P., 2021, arXiv
  • [6] Gao TY, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, P3816
  • [7] Ge C., 2022, arXiv
  • [8] Goyal Saurabh, 2020, P MACHINE LEARNING R, V119
  • [9] Hambardzumyan K, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, P4921
  • [10] Han X., 2021, arXiv