Enhanced Indonesian Ethnic Speaker Recognition using Data Augmentation Deep Neural Network

被引：12

作者：

Nugroho, Kristiawan ^{[1
,2
]}

Noersasongko, Edi ^{[1
]}

Purwanto ^{[1
]}

Muljono ^{[1
]}

Setiadi, De Rosal Ignatius Moses ^{[1
]}

机构：

[1] Univ Dian Nuswantoro, Fac Comp Sci, Semarang, Indonesia

[2] AMIK Jakarta Teknol Cipta, Semarang, Indonesia

来源：

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES | 2022年 / 34卷 / 07期

关键词：

Speaker Recognition; Data Augmentation; Deep Neural Network; Indonesian Ethnic; Adding White Noise; Pitch Shifting; Time Stretching; SPEECH RECOGNITION; CLASSIFICATION;

D O I：

10.1016/j.jksuci.2021.04.002

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speaker Recognition is a challenging topic in Speech Processing research area. The various models pro-posed have succeeded in achieving a fairly high level of accuracy in this research. However, the level of Speaker Recognition accuracy is not yet maximized because the small dataset is a problem that is still being faced at this time, causing overfitting and biased data samples. This work proposes a Data Augmentation strategy using Adding White Noise techniques, Pitch Shifting, and Time Stretching, which are processed using a Deep Neural Network to produce a new model in speaker recognition as an approach called as DA-DNN7L. The Data Augmentation approach is used as a solution to increase the lim-ited data quantity of Indonesian ethnic speakers, while the seven layer DNN is an architecture that pro-vides the best accuracy performance compared to other multilayer approach models, besides that the 7 layer approach used in several other studies achieves a high degree of accuracy. Research that has been carried out using the best performance seven-layer Deep Neural Network Data Augmentation strategy resulted in an accuracy rate of 99.76% and a loss of 0.05 in the 70%:30% split ratio and the addition of 400 augmentation data. After seeing the performance of this model, it can be concluded that Data Augmentation Deep Neural Network can improve the speaker's recognition performance using the Indonesian ethnic dataset. (C) 2021 The Authors. Published by Elsevier B.V. on behalf of King Saud University.

引用

页码：4375 / 4384

页数：10

共 71 条

[1] Performance Measurement Of Mel Frequency CeptralCoefficient (MFCC) Method In Learning System Of AlQur'anBasedInNaghamPatternRecognition [J].

Afrillia, Yesy ;

Mawengkang, Herman ;

Ramli, Marwan ;

Fadlisyahand ;

Fhonna, RizkyPutra .

INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICONICT), 2017, 930

[2]

Ahmad Shahab D.L, 2017, 2016 C ORIENTAL CHAP, DOI [10.1109/ICSDA.2016.7919002, DOI 10.1109/ICSDA.2016.7919002]

[3]

[Anonymous], 2015, ISMIR

[4]

[Anonymous], 2016, IMPROVING SPEECH REC

[5]

Ashar Aweem, 2020, 2020 INT C EM TRENDS, P1, DOI [10.1109/ICETST49965.2020.9080730, DOI 10.1109/ICETST49965.2020.9080730]

[6]

Atmaja B.T., 2020, DIFFERENCES SONG SPE, P1

[7] Hierarchical Transfer Learning for Multilingual, Multi-Speaker, and Style Transfer DNN-Based TTS on Low-Resource Languages [J].

Azizah, Kurniawati ;

Adriani, Mirna ;

Jatmiko, Wisnu .

IEEE ACCESS, 2020, 8 :179798-179812

[8]

Bao LL, 2016, PROCEEDINGS OF 2016 THE 2ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS, P387, DOI 10.1109/ICCAR.2016.7486761

[9]

Chakroun R, 2020, INT WIREL COMMUN, P2204, DOI 10.1109/IWCMC48107.2020.9148102

[10]

Chakroun R, 2016, 2016 2ND INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP), P693, DOI 10.1109/ATSIP.2016.7523169

← 1 2 3 4 5 6 7 8 →