Multimedia fusion in automatic extraction of studio speech segments for spoken document retrieval

被引:0
|
作者
Hui, PY [1 ]
Lo, WK [1 ]
Meng, HM [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Human Comp Commun Lab, Shatin, Hong Kong, Peoples R China
来源
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING | 2003年
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes our progress in Cantonese spoken document retrieval. Over 60 hours of Cantonese television news broadcasts have been collected as part of AoE-IT Multimedia Repository. We have also developed the Multimedia Markup Language (MmML) for annotating the multimedia content in terms of anchor/field video frames and audio recordings. The audio tracks are indexed by a Cantonese syllable recognizer. Our investigation indicates that there is a large discrepancy in recognition performance, i.e. dropping from 59% to 39% in syllable accuracy (and corresponding reliability in audio indexing), as we move from anchor speech recorded in the studio to reporter/interview speech recorded in the field. Hence we present several automatic methods to extract anchor/studio speech from the audio tracks for retrieval: (i) extraction based only on video information using a fuzzy c-means algorithm; (ii) extraction based only on audio information using Gaussian Mixture Models; and (iii) a fusion strategy that combines video- and audio-based extraction. This paper presents the performance of various extraction techniques and the related retrieval performance in a known-item spoken document retrieval task.
引用
收藏
页码:724 / 727
页数:4
相关论文
共 37 条
  • [21] Missing-Feature Reconstruction for Band-Limited Speech Recognition in Spoken Document Retrieval
    Kim, Wooil
    Hansen, John H. L.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2306 - 2309
  • [22] Syllable-based Chinese text/spoken document retrieval using text/speech queries
    Bai, BR
    Chen, BL
    Wang, HM
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2000, 14 (05) : 603 - 616
  • [23] Remote Spoken Document Retrieval using Foreground Speech Segmentation based Isolated Word Recognizer
    Deepak, K. T.
    Prasanna, S. R. Mahadeva
    2013 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2013,
  • [24] Multimedia document image retrieval based on regional correlation fusion texture feature FDPC
    Fancong Zeng
    Jinli Xu
    Multimedia Tools and Applications, 2019, 78 : 24023 - 24034
  • [25] Multimedia document image retrieval based on regional correlation fusion texture feature FDPC
    Zeng, Fancong
    Xu, Jinli
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) : 24023 - 24034
  • [26] Multi-lingual date field extraction for automatic document retrieval by machine
    Mandal, Ranju
    Roy, Partha Pratim
    Pal, Umapada
    Blumenstein, Michael
    INFORMATION SCIENCES, 2015, 314 : 277 - 292
  • [27] CITED TITLES - NEW SOURCE OF KEYWORD EXTRACTION FOR AUTOMATIC DOCUMENT CLASSIFICATION AND RETRIEVAL
    KWOK, KL
    PROCEEDINGS OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1974, 11 : 56 - 57
  • [28] Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion
    Chinese University of Hong Kong
    不详
    不详
    不详
    ACM Transactions on Asian Language Information Processing, 2003, 2 (01): : 1 - 26
  • [29] The Design of Fusion Semantics Automatic Labeling and Speech Recognition Image Retrieval System
    Lu Weiyan
    Wang Wenyan
    Liu-Suqi
    PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE OF MODERN COMPUTER SCIENCE AND APPLICATIONS, 2013, 191 : 215 - +
  • [30] Automatic Content Linking: Speech-based Just-in-time Retrieval for Multimedia Archives
    Popescu-Belis, Andrei
    Kilgour, Jonathan
    Poller, Peter
    Nanchen, Alexandre
    Boertjes, Erik
    de Wit, Joost
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 703 - 703