Modified layer deep convolution neural network for text-independent speaker recognition

被引：9

作者：

Karthikeyan, V ^{[1
]}

Priyadharsini, Suja S. ^{[2
]}

机构：

[1] Kalasalingam Inst Technol, Dept Elect & Commun Engn, Krishnankoil, Tamil Nadu, India

[2] Anna Univ, Dept Elect & Commun Engn, Reg Campus Tirunelveli, Tirunelveli, Tamil Nadu, India

来源：

JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE | 2024年 / 36卷 / 02期

关键词：

Speaker identification; deep learning; CNN; spectrogram; MFCC;

D O I：

10.1080/0952813X.2022.2092560

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speaker recognition is the task of identifying the spokesman automatically using speaker-specific features. It has been a popular and most involved topic in the field of speech technology. This field opens a wide opportunity for research and finds its application in the areas such as forensics, authentication, security, etc. In this work, a modified deep-convolutional neural network structure has been proposed for speaker identification that has improved convolution, activation, and pooling layers along with Adam's optimiser. The proposed architecture yielded the increase of prediction accuracy and reduction of Loss function when compared to the generic Convolutional Neural Network scheme. The execution of the proposed architecture is validated by various datasets and the outcomes show that the modified CNN performs better than the other state-of-the-art models regarding both accuracy (avg 99%) and loss function (avg 1%). From the analysis, it is found that the Modified-CNN suits the best for real-time speaker identification applications as the efficacy of the model does not degrade due to the effects of noise and interferences that are caused in the recording environment. Relevance of the work: Speaker Recognition is an area of interest in which ML and DL schemes, when combined, have the potential to make history in the areas of Automation and Authentication. Using a modified CNN can enhance the process by ignoring many issues such as false positives, background noise, and so on. This process can be expanded to create a Raga Identification and Therapy mechanism that can be used to treat diseases.

引用

页码：273 / 285

页数：13

共 44 条

[11] Intelligent Fault Diagnosis of the High-Speed Train With Big Data Based on Deep Neural Networks [J].

Hu, Hexuan ;

Tang, Bo ;

Gong, Xuejiao ;

Wei, Wei ;

Wang, Huihui .

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2017, 13 (04) :2106-2116

[12] An introduction to biometric recognition [J].

Jain, AK ;

Ross, A ;

Prabhakar, S .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2004, 14 (01) :4-20

[13] THE COMPUTATION OF LINE SPECTRAL FREQUENCIES USING CHEBYSHEV POLYNOMIALS [J].

KABAL, P ;

RAMACHANDRAN, RP .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (06) :1419-1426

[14] Hybrid machine learning classification scheme for speaker identification [J].

Karthikeyan, V ;

Priyadharsini, Suja S. .

JOURNAL OF FORENSIC SCIENCES, 2022, 67 (03) :1033-1048

[15] A strong hybrid AdaBoost classification algorithm for speaker recognition [J].

Karthikeyan, V ;

Suja Priyadharsini, S. .

SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2021, 46 (03)

[16]

Kenny P., 2014, Odyssey, P293

[17]

Kenny P., 2010, OD 2010 SPEAK LANG R

[18] An overview of text-independent speaker recognition: From features to supervectors [J].

Kinnunen, Tomi ;

Li, Haizhou .

SPEECH COMMUNICATION, 2010, 52 (01) :12-40

[19]

Li C., 2017, arXiv

[20]

Martinez J., 2012, 2012 22nd International Conference on Electrical Communications and Computers (CONIELECOMP 2012), P248, DOI 10.1109/CONIELECOMP.2012.6189918

← 1 2 3 4 5 →