Acoustic Modeling for Under-resourced Languages: A Role in Vietnamese Soccer Video Retrieval

被引:0
作者
Pham, Nhut M. [1 ]
Vu, Quan H. [1 ]
机构
[1] Univ Sci, Artificial Intelligence Lab, VNU HCM, HCM, Ho Chi Minh City, Vietnam
来源
2013 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC) | 2013年
关键词
soccer video; event detection; speech recognition; under-resourced language; acoustic modeling; SYSTEM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Insufficient training data poses a great challenge to acoustic modeling in automatic speech recognition. The problem becomes more severe when presented in the context of under-resourced languages and several specific domains which lack attention from research. This paper explores the role of under-resourced acoustic models in speech-based soccer event retrieval. An event is defined as the spatiotemporal entity interesting to users, which is remarked by the announcer's spoken words. By mining out spoken information from the video, soccer events are detected using a speech recognition system. To resolve the issue of limited training data, subspace Gaussian mixture models are employed. Experimental evaluations are conducted on the first round of World Cup 2010 and the Vietnamese AFF Suzuki-cup 2008 databases. In the best case, transcription performance reaches 74.3% accuracy rate, and an average event detection rate of 60.62% can be obtained.
引用
收藏
页码:652 / 656
页数:5
相关论文
共 11 条
  • [1] A multi-modal system for the retrieval of semantic video events
    Amir, A
    Basu, S
    Iyengar, G
    Lin, CY
    Naphade, M
    Smith, JR
    Srinivasan, S
    Tseng, B
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2004, 96 (02) : 216 - 236
  • [2] Event based indexing of broadcasted sports video by intermodal collaboration
    Babaguchi, N
    Kawai, Y
    Kitahashi, T
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2002, 4 (01) : 68 - 75
  • [3] Burget L, 2010, P ICASSP 10
  • [4] Fleischman Michael, 2008, ACL, P121
  • [5] LODEM: A system for on-demand video lectures
    Fujii, A
    Itou, K
    Ishikawa, T
    [J]. SPEECH COMMUNICATION, 2006, 48 (05) : 516 - 531
  • [6] Katagiri E. S., 1998, HDB NEURAL NETWORKS
  • [7] Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language
    Le, Viet-Bac
    Besacier, Laurent
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (08): : 1471 - 1482
  • [8] Povey D, 2010, P ICASSP 10
  • [9] SUN X.-h., 2007, COMMUNICATION COMPUT, V4, P18
  • [10] WOJCICKI KK, 2006, P 11 AUSTR INT C SPE, P76