Few-shot intent detection with mutual information and contrastive learning

被引：1

作者：

Yang, Shun ^{[1
]}

Du, YaJun ^{[1
]}

Huang, JiaMing ^{[1
]}

Li, XianYong ^{[1
]}

Du, ShangYi ^{[2
]}

Liu, Jia ^{[1
]}

Li, YanLi ^{[1
]}

机构：

[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610065, Sichuan, Peoples R China

[2] McGill Univ, Sch Stat & Comp Sci, Montreal, PQ H3A 0G4, Canada

来源：

APPLIED SOFT COMPUTING | 2024年 / 167卷

关键词：

Few-shot learning; Intent detection; Mutual information; Contrastive learning; Meta-learning;

D O I：

10.1016/j.asoc.2024.112338

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Few-shot intent detection is a challenging task. Most existing methods only focus on acquisition of generalization knowledge in known classes, or on the adaptation situation of meta-learning tasks. However, these methods lack both the learning of beneficial information for intent detection and the learning of discriminative representation of intent keywords. Solving these problems can effectively improve the model's ability to recognize text intent and thus effectively execute user commands. Therefore, in this paper, we propose a few-shot intent detection framework based on Mutual Information and Contrastive Learning (MICL). It improves the robustness of the model learning of intent keywords through mutual information maximization, thus improving the accuracy of intent detection. Additionally, maximization of mutual information enhances the intent representation consistency on two text views,1which helps to promote the efficiency of sample contrastive learning. Specifically, we use a neural network to approximately estimate the joint probability distribution and marginal probability distribution of two text views and maximize the mutual information through back-propagation. In order to enable the model to focus on intent keywords in text, we introduce a sample-level contrastive learning. It helps the model to learn discriminative representations of intent keywords by comparing the similarity of text from two text views. In addition, to prevent overfitting of the model on known classes, we introduce a class-level contrastive learning. We regard it as regularization of known classes, aiming to improve the model generalization performance on unknown classes. Our method achieves state-of-the-art performance on four public datasets.2

引用

页数：16

共 50 条

[1] Novel intelligent predictive networks for analysis of chaos in stochastic differential SIS epidemic model with vaccination impact [J].

Anwar, Nabeela ;

Ahmad, Iftikhar ;

Kiani, Adiqa Kausar ;

Shoaib, Muhammad ;

Raja, Muhammad Asif Zahoor .

MATHEMATICS AND COMPUTERS IN SIMULATION, 2024, 219 :251-283

[2] Novel neuro-stochastic adaptive supervised learning for numerical treatment of nonlinear epidemic delay differential system with impact of double diseases [J].

Anwar, Nabeela ;

Ahmad, Iftikhar ;

Kiani, Adiqa Kausar ;

Shoaib, Muhammad ;

Raja, Muhammad Asif Zahoor .

INTERNATIONAL JOURNAL OF MODELLING AND SIMULATION, 2024,

[3]

Bao Yujia, 2020, ICLR

[4]

Belghazi MI, 2018, PR MACH LEARN RES, V80

[5]

Bertinetto Luca, 2019, 7 INT C LEARNING REP

[6]

Casanueva I, 2020, NLP FOR CONVERSATIONAL AI, P38

[7]

Chanchani S, 2023, PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, P15836

[8]

Chen JF, 2022, AAAI CONF ARTIF INTE, P10492

[9] LLMs to the Moon? Reddit Market Sentiment Analysis with Large Language Models [J].

Deng, Xiang ;

Bashlovkina, Vasilisa ;

Han, Feng ;

Baumgartner, Simon ;

Bendersky, Michael .

COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, :1014-1019

[10]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

← 1 2 3 4 5 →