Automated recognition of bird song elements from continuous recordings using dynamic time warping and hidden Markov models: A comparative study

被引:190
作者
Kogan, JA [1 ]
Margoliash, D [1 ]
机构
[1] Univ Chicago, Dept Organismal Biol & Anat, Chicago, IL 60637 USA
关键词
D O I
10.1121/1.421364
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The performance of two techniques is compared for automated recognition of bird song units from continuous recordings. The advantages and limitations of dynamic time warping (DTW) and hidden Markov models (HMMs) are evaluated on a large database of male songs of zebra finches (Taeniopygia guttata) and indigo buntings (Passerina cyanea), which have different types of vocalizations and have been recorded under different laboratory conditions. Depending on the quality of recordings and complexity of song, the DTW-based technique gives excellent to satisfactory performance. Under challenging conditions such as noisy recordings or presence of good performance of the DTW-based technique requires careful confusing short-duration calls, good performance of the DTW-based technique requires careful selection of templates that may demand expert knowledge. Because HMMs are trained, equivalent or even better performance of HMMs can be achieved based only on segmentation and labeling of constituent vocalizations, albeit with many more training examples than DTW templates. One weakness in HMM performance is the misclassification of short-duration vocalizations or song units with more variable structure (e.g., some calls, and syllables of plastic songs). To address these and other limitations, new approaches for analyzing bird vocalizations are discussed. (C) 1998 Acoustical Society of America.
引用
收藏
页码:2185 / 2196
页数:12
相关论文
共 43 条
[1]   Template-based automatic recognition of birdsong syllables from continuous recordings [J].
Anderson, SE ;
Dave, AS ;
Margoliash, D .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (02) :1209-1219
[2]  
[Anonymous], 1989, TOKEN PASSING SIMPLE
[3]   Learning the higher-order structure of a natural sound [J].
Bell, AJ ;
Sejnowski, TJ .
NETWORK-COMPUTATION IN NEURAL SYSTEMS, 1996, 7 (02) :261-267
[4]  
BOURLAND H, 1994, TR94064 INT COMP SCI
[5]  
Bridle J. S., 1982, Proceedings of ICASSP 82. IEEE International Conference on Acoustics, Speech and Signal Processing, P899
[6]   A QUANTITATIVE MEASURE OF SIMILARITY FOR TURSIOPS-TRUNCATUS SIGNATURE WHISTLES [J].
BUCK, JR ;
TYACK, PL .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1993, 94 (05) :2497-2506
[7]  
Catchpole CK., 1995, BIRD SONG BIOL THEME
[8]   Discriminative Training of Dynamic Programming Based Speech Recognizers [J].
Chang, Pao-Chung ;
Juang, Biing-Hwang .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02) :135-143
[9]  
Deller Jr J. R., 1993, DISCRETE TIME PROCES
[10]  
FINE S, 1998, IN PRESS HIERARCHICA