Speaker Recognition Based on 3DCNN-LSTM

被引：0

作者：

Hu, ZhangFang ^{[1
]}

Si, XingTong ^{[2
]}

Luo, Yuan ^{[1
]}

Tang, ShanShan ^{[2
]}

Jian, Fang ^{[2
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Sch Optoelect Engn, Key Lab Opt Informat Sensing & Technol, Chongqing 400065, Peoples R China

[2] Chongqing Univ Posts & Telecommun, Sch Optoelect Engn, Dept Elect Sci & Technol, Chongqing 400065, Peoples R China

来源：

ENGINEERING LETTERS | 2021年 / 29卷 / 02期

基金：

中国国家自然科学基金;

关键词：

speaker recognition; semi-text processing; 3DCNN; LSTM; GMM;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

The traditional speaker recognition method reduces the feature signal from high to low dimensions, but this often leads to some speaker information loss, resulting in a low speaker recognition rate. In response to this problem, this paper proposes a model based on the combination of a 3D convolutional neural network (3DCNN) and a long short-term memory neural network (LSTM). First, the model uses a fixed-step speech feature vector as the 3DCNN input, which converts the text-independent speaker recognition mode into a "semi-text"-related speaker recognition mode, which greatly preserves the speaker's speech features, and thus improving the difference between the characteristics of different speakers. Second, the 3D convolution kernel designed in this paper can extract the personality characteristics of speakers in different dimensions to further distinguish different speakers, connect the output signal to the LSTM network through a time series to enhance the contextual connection of the speaker's voice, and finally mark the classification output result to realize a complete speaker recognition system. The experimental results show that the model structure improves the speaker recognition rate on AISHELL-1 dataset in short-term speech compared with traditional algorithms and popular embedding features, and the system is more robust over time.

引用

页码：463 / 470

页数：8

共 52 条

[1]

Akula A, 2009, DIG SIGN PROC WORKSH

[2] Evaluation of cloud base height in the North American Regional Reanalysis using ceilometer observations [J].

An, Ning ;

Pinker, Rachel T. ;

Wang, Kaicun ;

Rogers, Eric ;

Zuo, Zhiyan .

INTERNATIONAL JOURNAL OF CLIMATOLOGY, 2020, 40 (06) :3161-3178

[3]

[Anonymous], 2018, 2018 26 SIGN PROC CO

[4]

[Anonymous], 2019, CHINESE J ELECTRON, V28, P138

[5]

Chakroun R, 2016, INT C ADV TECHN SIGN

[6] Efficient text-independent speaker recognition with short utterances in both clean and uncontrolled environments [J].

Chakroun, Rania ;

Frikha, Mondher .

MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (29-30) :21279-21298

[7]

Chauhan N, 2019, 2019 IEEE 4 INT C CO

[8] Speaker Identification Based on Multimodal Long Short-Term Memory with Depth-Gate [J].

Chen Huangkang ;

Chen Ying .

LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (03)

[9]

Chen X, 2019, 3 INT C

[10]

Cumani S, 2019, IEEE ACM T AUDIO SPE

← 1 2 3 4 5 6 →