Few-shot intent detection with mutual information and contrastive learning

被引：1

作者：

Yang, Shun ^{[1
]}

Du, YaJun ^{[1
]}

Huang, JiaMing ^{[1
]}

Li, XianYong ^{[1
]}

Du, ShangYi ^{[2
]}

Liu, Jia ^{[1
]}

Li, YanLi ^{[1
]}

机构：

[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610065, Sichuan, Peoples R China

[2] McGill Univ, Sch Stat & Comp Sci, Montreal, PQ H3A 0G4, Canada

来源：

APPLIED SOFT COMPUTING | 2024年 / 167卷

关键词：

Few-shot learning; Intent detection; Mutual information; Contrastive learning; Meta-learning;

D O I：

10.1016/j.asoc.2024.112338

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Few-shot intent detection is a challenging task. Most existing methods only focus on acquisition of generalization knowledge in known classes, or on the adaptation situation of meta-learning tasks. However, these methods lack both the learning of beneficial information for intent detection and the learning of discriminative representation of intent keywords. Solving these problems can effectively improve the model's ability to recognize text intent and thus effectively execute user commands. Therefore, in this paper, we propose a few-shot intent detection framework based on Mutual Information and Contrastive Learning (MICL). It improves the robustness of the model learning of intent keywords through mutual information maximization, thus improving the accuracy of intent detection. Additionally, maximization of mutual information enhances the intent representation consistency on two text views,1which helps to promote the efficiency of sample contrastive learning. Specifically, we use a neural network to approximately estimate the joint probability distribution and marginal probability distribution of two text views and maximize the mutual information through back-propagation. In order to enable the model to focus on intent keywords in text, we introduce a sample-level contrastive learning. It helps the model to learn discriminative representations of intent keywords by comparing the similarity of text from two text views. In addition, to prevent overfitting of the model on known classes, we introduce a class-level contrastive learning. We regard it as regularization of known classes, aiming to improve the model generalization performance on unknown classes. Our method achieves state-of-the-art performance on four public datasets.2

引用

页数：16

共 50 条

[21]

Hendrycks D, 2020, Arxiv, DOI [arXiv:1606.08415, DOI 10.48550/ARXIV.1606.08415]

[22]

Klein T, 2023, PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, P6159

[23]

Larson S, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P1311

[24]

Lei S, 2023, PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, P11014

[25] Logistic Regression Matching Pursuit algorithm for text classification [J].

Li, Qing ;

Zhao, Shuai ;

Zhao, Shancheng ;

Wen, Jinming .

KNOWLEDGE-BASED SYSTEMS, 2023, 277

[26] MMIF: Interpretable Hyperspectral and Multispectral Image Fusion via Maximum Mutual Information [J].

Li, Yunsong ;

Guo, Wenjin ;

Xie, Weiying ;

Jiang, Tao ;

Du, Qian .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 :1-13

[27] A Contextual Dependency-Aware Graph Convolutional Network for extracting entity relations [J].

Liao, Jiahui ;

Du, Yajun ;

Hu, Jinrong ;

Li, Hui ;

Li, Xianyong ;

Chen, Xiaoliang .

EXPERT SYSTEMS WITH APPLICATIONS, 2024, 239

[28] MidGAN: Mutual information in GAN-based dialogue models [J].

Najari, Shaghayegh ;

Salehi, Mostafa ;

Farahbakhsh, Reza ;

Tyson, Gareth .

APPLIED SOFT COMPUTING, 2023, 148

[29] Modular Meta-Learning for Power Control via Random Edge Graph Neural Networks [J].

Nikoloska, Ivana ;

Simeone, Osvaldo .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (01) :457-470

[30]

Paszke A, 2019, ADV NEUR IN, V32

← 1 2 3 4 5 →