Speaker identification features extraction methods: A systematic review

被引:100
|
作者
Tirumala, Sreenivas Sremath [1 ,3 ]
Shahamiri, Seyed Reza [1 ]
Garhwal, Abhimanyu Singh [1 ,3 ]
Wang, Ruili [2 ,4 ]
机构
[1] Manukau Inst Technol, Fac Business & Informat Technol, Auckland, New Zealand
[2] Massey Univ, INMS, Comp Sci & Informat Technol, Auckland, New Zealand
[3] MIT Manukau, Cnr Manukau Stn Rd Davies Ave,Private Bag 94006, Manukau 2241, New Zealand
[4] Massey Univ, Room 3-10,IIMS Bldg,Albany Campus, Auckland, New Zealand
关键词
Feature extraction; Kitchenham systematic review; MFCC; Speaker identification; Speaker recognition; ARTIFICIAL NEURAL-NETWORKS; SPEECH RECOGNITION; MFCC; VERIFICATION; ROBUSTNESS; HISTOGRAM;
D O I
10.1016/j.eswa.2017.08.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker Identification (SI) is the process of identifying the speaker from a given utterance by comparing the voice biometrics of the utterance with those utterance models stored beforehand. SI technologies are taken a new direction due to the advances in artificial intelligence and have been used widely in various domains. Feature extraction is one of the most important aspects of SI, which significantly influences the SI process and performance. This systematic review is conducted to identify, compare, and analyze various feature extraction approaches, methods, and algorithms of SI to provide a reference on feature extraction approaches for SI applications and future studies. The review was conducted according to Kitchenham systematic review methodology and guidelines, and provides an in-depth analysis on proposals and implementations of SI feature extraction methods discussed in the literature between year 2011 and 2106. Three research questions were determined and an initial set of 535 publications were identified to answer the questions. After applying exclusion criteria 160 related publications were shortlisted and reviewed in this paper; these papers were considered to answer the research questions, Results indicate that pure Mel-Frequency Cepstral Coefficients (MFCCs) based feature extraction approaches have been used more than any other approach. Furthermore, other MFCC variations, such as MFCC fusion and cleansing approaches, are proven to be very popular as well. This study identified that the current SI research trend is to develop a robust universal SI framework to address the important problems of SI such as adaptability, complexity, multi-lingual recognition, and noise robustness. The results presented in this research are based on past publications, citations, and number of implementations with citations being most relevant. This paper also presents the general process of SI. (C)2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:250 / 271
页数:22
相关论文
共 50 条
  • [1] Feature Extraction Methods for Speaker Recognition: A Review
    Chaudhary, Gopal
    Srivastava, Smriti
    Bhardwaj, Saurabh
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2017, 31 (12)
  • [2] PHYSIOLOGICALLY-MOTIVATED FEATURE EXTRACTION FOR SPEAKER IDENTIFICATION
    Wang, Jianglin
    Johnson, Michael T.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] Robust Q Features for Speaker Identification
    Deshpande, Mangesh S.
    Holambe, Raghunath S.
    2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009), 2009, : 209 - 213
  • [4] Optimal MFCC Features Extraction by Differential Evolution Algorithm for Speaker Recognition
    Sadeghi, Mohsen
    Marvi, Hossein
    2017 3RD IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2017, : 169 - 173
  • [5] A Review on Feature Extraction for Speaker Recognition under Degraded Conditions
    Disken, Gokay
    Tufekci, Zekeriya
    Saribulut, Lutfu
    Cevik, Ulus
    IETE TECHNICAL REVIEW, 2017, 34 (03) : 321 - 332
  • [6] A network model of speaker identification with new feature extraction methods and BLSTM
    Wang, Xingmei
    Xue, Fuzhao
    Wang, Wei
    Liu, Anhua
    NEUROCOMPUTING, 2020, 403 (403) : 167 - 181
  • [7] Speaker Identification Using HHT Spectrum Features
    Liu, Jia-Wei
    Wang, Jia-Ching
    Lin, Chang-Hong
    2011 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2011), 2011, : 145 - 148
  • [8] VAD, feature extraction and modelling techniques for speaker recognition: a review
    Jainar, Spoorti J.
    Sale, Pritam Limbaji
    Nagaraja, B. G.
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2020, 12 (1-2) : 1 - 18
  • [9] A Hybrid GRU-CNN Feature Extraction Technique for Speaker Identification
    Shihab, Md Shazzad Hossain
    Aditya, Shuvra
    Setu, Jahangir Hossain
    Imtiaz-Ud-Din, K. M.
    Efat, Md Iftekharul Alam
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [10] Learning Discriminative Features for Speaker Identification and Verification
    Yadav, Sarthak
    Rai, Atul
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2237 - 2241