Feature selection for speaker verification using genetic programming

被引:11
作者
Loughran R. [1 ]
Agapitos A. [1 ]
Kattan A. [2 ]
Brabazon A. [1 ]
O’Neill M. [1 ]
机构
[1] Natural Computing Research and Applications Group (NCRA), University College Dublin, Dublin
[2] Computer Science Department, Um Al-Qura University, Mecca
基金
爱尔兰科学基金会;
关键词
Feature selection; Genetic programming; Speaker verification; Unbalanced data;
D O I
10.1007/s12065-016-0150-5
中图分类号
学科分类号
摘要
We present a study examining feature selection from high performing models evolved using genetic programming (GP) on the problem of automatic speaker verification (ASV). ASV is a highly unbalanced binary classification problem in which a given speaker must be verified against everyone else. We evolve classification models for 10 individual speakers using a variety of fitness functions and data sampling techniques and examine the generalisation of each model on a 1:9 unbalanced set. A significant difference between train and test performance is found which may indicate overfitting in the models. Using only the best generalising models, we examine two methods for selecting the most important features. We compare the performance of a number of tuned machine learning classifiers using the full 275 features and a reduced set of 20 features from both feature selection methods. Results show that using only the top 20 features found in high performing GP programs led to test classifications that are as good as, or better than, those obtained using all data in the majority of experiments undertaken. The classification accuracy between speakers varies considerably across all experiments showing that some speakers are easier to classify than others. This indicates that in such real-world classification problems, the content and quality of the original data has a very high influence on the quality of results obtainable. © 2017, Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:1 / 21
页数:20
相关论文
共 65 条
  • [1] Agapitos A., Brabazon A., O'Neill M., Controlling overfitting in symbolic regression based on a bias/variance error decomposition, (2012)
  • [2] Alegre F., Amehraye A., IEEE international conference on acoustics, speech and signal processing, IEEE, (2013)
  • [3] Barandela R., Sanchez J.S., Garcia V., Rangel E., Strategies for learning in class imbalance problems, Pattern Recognit, 36, 3, pp. 849-851, (2003)
  • [4] Batista G.E., Prati R.C., Monard M.C., A study of the behavior of several methods for balancing machine learning training data, ACM Sigkdd Explor Newsl, 6, 1, pp. 20-29, (2004)
  • [5] Batista G.E.A.P.A., Prati R.C., Monard M.C., A study of the behavior of several methods for balancing machine learning training data, SIGKDD Explor Newsl, 6, 1, pp. 20-29, (2004)
  • [6] Batista G.E.A.P.A., Prati R.C., Monard M.C., Balancing strategies and class overlapping, Advances in intelligent data analysis VI, 6th international symposium on intelligent data analysis, IDA 2005, Madrid, Spain, September 8–10, 2005, Proceedings, LNCS, Springer, Berlin, 3646, pp. 24-35, (2005)
  • [7] Bhowan U., Johnston M., Zhang M., Developing new fitness functions in genetic programming for classification with unbalanced data, Syst Man Cybern Part B Cybern IEEE Trans, 42, 2, pp. 406-421, (2012)
  • [8] Bhowan U., Johnston M., Zhang M., Yao X., Evolving diverse ensembles using genetic programming for classification with unbalanced data, Evolut Comput IEEE Trans, 17, 3, pp. 368-386, (2013)
  • [9] Campbell W.M., Sturim D.E., Reynolds D.A., Support vector machines using gmm supervectors for speaker verification, Signal Process Lett IEEE, 13, 5, pp. 308-311, (2006)
  • [10] Charbuillet C., Gas B., Chetouani M., Zarader J.L., Optimizing feature complementarity by evolution strategy: application to automatic speaker verification, Speech Commun, 51, 9, pp. 724-731, (2009)