Multimedia fusion in automatic extraction of studio speech segments for spoken document retrieval

被引：0

作者：

Hui, PY ^{[1
]}

Lo, WK ^{[1
]}

Meng, HM ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Human Comp Commun Lab, Shatin, Hong Kong, Peoples R China

来源：

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING | 2003年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper describes our progress in Cantonese spoken document retrieval. Over 60 hours of Cantonese television news broadcasts have been collected as part of AoE-IT Multimedia Repository. We have also developed the Multimedia Markup Language (MmML) for annotating the multimedia content in terms of anchor/field video frames and audio recordings. The audio tracks are indexed by a Cantonese syllable recognizer. Our investigation indicates that there is a large discrepancy in recognition performance, i.e. dropping from 59% to 39% in syllable accuracy (and corresponding reliability in audio indexing), as we move from anchor speech recorded in the studio to reporter/interview speech recorded in the field. Hence we present several automatic methods to extract anchor/studio speech from the audio tracks for retrieval: (i) extraction based only on video information using a fuzzy c-means algorithm; (ii) extraction based only on audio information using Gaussian Mixture Models; and (iii) a fusion strategy that combines video- and audio-based extraction. This paper presents the performance of various extraction techniques and the related retrieval performance in a known-item spoken document retrieval task.

引用

页码：724 / 727

页数：4

共 37 条

[21] Missing-Feature Reconstruction for Band-Limited Speech Recognition in Spoken Document Retrieval
Kim, Wooil
Hansen, John H. L.
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2306 - 2309
[22] Syllable-based Chinese text/spoken document retrieval using text/speech queries
Bai, BR
Chen, BL
Wang, HM
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2000, 14 (05) : 603 - 616
[23] Remote Spoken Document Retrieval using Foreground Speech Segmentation based Isolated Word Recognizer
Deepak, K. T.
Prasanna, S. R. Mahadeva
2013 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2013,
[24] Multimedia document image retrieval based on regional correlation fusion texture feature FDPC
Fancong Zeng
Jinli Xu
Multimedia Tools and Applications, 2019, 78 : 24023 - 24034
[25] Multimedia document image retrieval based on regional correlation fusion texture feature FDPC
Zeng, Fancong
Xu, Jinli
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) : 24023 - 24034
[26] Multi-lingual date field extraction for automatic document retrieval by machine
Mandal, Ranju
Roy, Partha Pratim
Pal, Umapada
Blumenstein, Michael
INFORMATION SCIENCES, 2015, 314 : 277 - 292
[27] CITED TITLES - NEW SOURCE OF KEYWORD EXTRACTION FOR AUTOMATIC DOCUMENT CLASSIFICATION AND RETRIEVAL
KWOK, KL
PROCEEDINGS OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1974, 11 : 56 - 57
[28] Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion
Chinese University of Hong Kong
不详
不详
不详
ACM Transactions on Asian Language Information Processing, 2003, 2 (01): : 1 - 26
[29] The Design of Fusion Semantics Automatic Labeling and Speech Recognition Image Retrieval System
Lu Weiyan
Wang Wenyan
Liu-Suqi
PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE OF MODERN COMPUTER SCIENCE AND APPLICATIONS, 2013, 191 : 215 - +
[30] Automatic Content Linking: Speech-based Just-in-time Retrieval for Multimedia Archives
Popescu-Belis, Andrei
Kilgour, Jonathan
Poller, Peter
Nanchen, Alexandre
Boertjes, Erik
de Wit, Joost
SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 703 - 703

← 1 2 3 4 →