Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification

被引：16

作者：

Shen, Peng ^{[1
]}

Lu, Xugang ^{[1
]}

Li, Sheng ^{[1
]}

Kawai, Hisashi ^{[1
]}

机构：

[1] Natl Inst Informat & Commun Technol, Koganei, Tokyo, Japan

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2020年 / 28卷 / 28期

基金：

日本学术振兴会;

关键词：

Task analysis; Speech processing; Training; Speech recognition; Neural networks; Feature extraction; Robustness; Internal representation learning; knowledge distillation; short utterances; spoken language identification;

D O I：

10.1109/TASLP.2020.3023627

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

With successful applications of deep feature learning algorithms, spoken language identification (LID) on long utterances obtains satisfactory performance. However, the performance on short utterances is drastically degraded even when the LID system is trained using short utterances. The main reason is due to the large variation of the representation on short utterances which results in high model confusion. To narrow the performance gap between long, and short utterances, we proposed a teacher-student representation learning framework based on a knowledge distillation method to improve LID performance on short utterances. In the proposed framework, in addition to training the student model on short utterances with their true labels, the internal representation from the output of a hidden layer of the student model is supervised with the representation corresponding to their longer utterances. By reducing the distance of internal representations between short, and long utterances, the student model can explore robust discriminative representations for short utterances, which is expected to reduce model confusion. We conducted experiments on our in-house LID dataset, and NIST LRE07 dataset, and showed the effectiveness of the proposed methods for short utterance LID tasks.

引用

页码：2674 / 2683

页数：10

共 50 条

[31] KD-INR: Time-Varying Volumetric Data Compression via Knowledge Distillation-Based Implicit Neural Representation
Han, Jun
Zheng, Hao
Bi, Chongke
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (10) : 6826 - 6838
[32] Knowledge Distillation For CTC-based Speech Recognition Via Consistent Acoustic Representation Learning
Tian, Sanli
Deng, Keqi
Li, Zehan
Ye, Lingxuan
Cheng, Gaofeng
Li, Ta
Yan, Yonghong
INTERSPEECH 2022, 2022, : 2633 - 2637
[33] Knowledge Distillation-Based Robust UAV Swarm Communication Under Malicious Attacks
Wu, Qirui
Zhang, Yirun
Yang, Zhaohui
Shikh-Bahaei, Mohammad
2024 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS, ICC WORKSHOPS 2024, 2024, : 1023 - 1029
[34] Facial landmark points detection using knowledge distillation-based neural networks
Fard, Ali Pourramezan
Mahoor, Mohammad H.
COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 215
[35] Toward Extremely Lightweight Distracted Driver Recognition With Distillation-Based Neural Architecture Search and Knowledge Transfer
Liu, Dichao
Yamasaki, Toshihiko
Wang, Yu
Mase, Kenji
Kato, Jien
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (01) : 764 - 777
[36] Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network
Han, Jaemin
Kim, Min Sik
Kim, Hyung Soon
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (01): : 64 - 68
[37] MUFTI: Multi-Domain Distillation-Based Heterogeneous Federated Continuous Learning
Gai, Keke
Wang, Zijun
Yu, Jing
Zhu, Liehuang
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 2721 - 2733
[38] DURATION-NORMALIZED FEATURE SELECTION FOR INDIAN SPOKEN LANGUAGE IDENTIFICATION IN UTTERANCE LENGTH MISMATCH
Bakshi, Aarti M.
Kopparapu, Sunil K.
JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2022, 17 (03): : 2120 - 2134
[39] Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning
Shen, Jiyuan
Yang, Wenzhuo
Chu, Zhaowei
Fan, Jiani
Niyato, Dusit
Lam, Kwok-Yan
ICC 2024 - IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2024, : 2034 - 2039
[40] Representation Learning and Knowledge Distillation for Lightweight Domain Adaptation
Bin Shah, Sayed Rafay
Putty, Shreyas Subhash
Schwung, Andreas
2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 1202 - 1207

← 1 2 3 4 5 →