A Near Real-Time Automatic Speaker Recognition Architecture for Voice-Based User Interface

被引:32
作者
Dhakal, Parashar [1 ]
Damacharla, Praveen [2 ]
Javaid, Ahmad Y. [1 ]
Devabhaktuni, Vijay [2 ]
机构
[1] Univ Toledo, Elect Engn & Comp Sci Dept, Toledo, OH 43606 USA
[2] Purdue Univ Northwest, ECE Dept, Hammond, IN 46323 USA
来源
MACHINE LEARNING AND KNOWLEDGE EXTRACTION | 2019年 / 1卷 / 01期
关键词
classifiers; convolution neural network; architecture; feature extraction; machine learning; random forest; speaker recognition; voice interface;
D O I
10.3390/make1010031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a novel pipelined near real-time speaker recognition architecture that enhances the performance of speaker recognition by exploiting the advantages of hybrid feature extraction techniques that contain the features of Gabor Filter (GF), Convolution Neural Networks (CNN), and statistical parameters as a single matrix set. This architecture has been developed to enable secure access to a voice-based user interface (UI) by enabling speaker-based authentication and integration with an existing Natural Language Processing (NLP) system. Gaining secure access to existing NLP systems also served as motivation. Initially, we identify challenges related to real-time speaker recognition and highlight the recent research in the field. Further, we analyze the functional requirements of a speaker recognition system and introduce the mechanisms that can address these requirements through our novel architecture. Subsequently, the paper discusses the effect of different techniques such as CNN, GF, and statistical parameters in feature extraction. For the classification, standard classifiers such as Support Vector Machine (SVM), Random Forest (RF) and Deep Neural Network (DNN) are investigated. To verify the validity and effectiveness of the proposed architecture, we compared different parameters including accuracy, sensitivity, and specificity with the standard AlexNet architecture.
引用
收藏
页码:504 / 520
页数:17
相关论文
共 56 条
  • [1] [Anonymous], 2018, CORTANA IS YOUR TRUL
  • [2] [Anonymous], 2017, INDIAN J SCI TECHNOL, DOI DOI 10.17485/ijst/2017/v10i30/115518
  • [3] [Anonymous], 2007, P C WORKSH ASS TECHN
  • [4] [Anonymous], 2015, ARXIV150905371
  • [5] [Anonymous], 2018, BUILD NATURAL RICH C
  • [6] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [7] Chang S.Y., 2013, P INT 14 ANN C INT S
  • [8] Chang S.-Y., 2014, P 15 ANN C INT SPEEC
  • [9] Chung Joon Son, 2018, P INTERSPEECH, DOI DOI 10.21437/INTERSPEECH.2018-1929
  • [10] Comparative study of automatic speech recognition techniques
    Cutajar, Michelle
    Gatt, Edward
    Grech, Ivan
    Casha, Owen
    Micallef, Joseph
    [J]. IET SIGNAL PROCESSING, 2013, 7 (01) : 25 - 46