Noise-robust Hidden Markov Models for limited training data for within-species bird phrase classification

被引:0
作者
Kaewtip, Kantapon [1 ]
Taylor, Charles [2 ]
Alwan, Abeer [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect Engn, Los Angeles, CA 90024 USA
[2] Univ Calif Los Angeles, Dept Ecol & Evolutionary Biol, Los Angeles, CA USA
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
基金
美国国家科学基金会;
关键词
Hidden Markov Models (HMMs); limited data; noise-robust; bird phrase classification; SPEECH RECOGNITION; CONTINUOUS RECORDINGS;
D O I
10.21437/Interspeech.2016-1360
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Hidden Markov Models (HMMs) have been studied and used extensively in speech and birdsong recognition, but they are not robust to limited training data and noise. This paper presents two novel approaches to training continuous and discrete HMMs with extremely limited data. First, the algorithm learns the global Gaussian Mixture Models (GMMs) for all training phrases available. GMM parameters are then used to initialize state parameters of each individual model. For the GMM-HMM framework, the number of states and the mixture components for each state are determined by the acoustic variation of each phrase type. The (high-energy) time-frequency prominent regions are used to compute the state emitting probability to increase noise-robustness. For the discrete HMM framework, the probability distribution of each state is initialized by the global GMMs in training. In testing, the probability of each codebook is estimated using the prominent regions of each state to increase noise-robustness. In Cassins Vireo phrase classification using 75 phrase types, the new GMM-I-IMM approach achieves 79.5% and 87% classification accuracy using I and 2 phrases, respectively, while HTK's GMM-HMM framework makes guess predictions resulting in 1.33% accuracy. The performance of the other algorithm is presented in the paper.
引用
收藏
页码:2587 / 2591
页数:5
相关论文
共 26 条
  • [1] Automated classification of bird and amphibian calls using machine learning: A comparison of methods
    Acevedo, Miguel A.
    Corrada-Bravo, Carlos J.
    Corrada-Bravo, Hector
    Villanueva-Rivera, Luis J.
    Aide, T. Mitchell
    [J]. ECOLOGICAL INFORMATICS, 2009, 4 (04) : 206 - 214
  • [2] Template-based automatic recognition of birdsong syllables from continuous recordings
    Anderson, SE
    Dave, AS
    Margoliash, D
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (02) : 1209 - 1219
  • [3] [Anonymous], 1997, The HTK book
  • [4] [Anonymous], 2014, Biological Reviews
  • [5] [Anonymous], 2011, WORKSH AUT SPEECH RE
  • [6] Acoustic monitoring in terrestrial environments using microphone arrays: applications, technological considerations and prospectus
    Blumstein, Daniel T.
    Mennill, Daniel J.
    Clemins, Patrick
    Girod, Lewis
    Yao, Kung
    Patricelli, Gail
    Deppe, Jill L.
    Krakauer, Alan H.
    Clark, Christopher
    Cortopassi, Kathryn A.
    Hanser, Sean F.
    McCowan, Brenda
    Ali, Andreas M.
    Kirschel, Alexander N. G.
    [J]. JOURNAL OF APPLIED ECOLOGY, 2011, 48 (03) : 758 - 767
  • [7] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION
    BOLL, SF
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02): : 113 - 120
  • [8] Automated sound recording and analysis techniques for bird surveys and conservation
    Brandes, T. Scott
    [J]. BIRD CONSERVATION INTERNATIONAL, 2008, 18 : S163 - S173
  • [9] Briggs F., TECHNICAL REPORT
  • [10] Catchpole CK., 2003, Bird song: biological themes and variations