Multimedia fusion in automatic extraction of studio speech segments for spoken document retrieval

被引：0

作者：

Hui, PY ^{[1
]}

Lo, WK ^{[1
]}

Meng, HM ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Human Comp Commun Lab, Shatin, Hong Kong, Peoples R China

来源：

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING | 2003年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper describes our progress in Cantonese spoken document retrieval. Over 60 hours of Cantonese television news broadcasts have been collected as part of AoE-IT Multimedia Repository. We have also developed the Multimedia Markup Language (MmML) for annotating the multimedia content in terms of anchor/field video frames and audio recordings. The audio tracks are indexed by a Cantonese syllable recognizer. Our investigation indicates that there is a large discrepancy in recognition performance, i.e. dropping from 59% to 39% in syllable accuracy (and corresponding reliability in audio indexing), as we move from anchor speech recorded in the studio to reporter/interview speech recorded in the field. Hence we present several automatic methods to extract anchor/studio speech from the audio tracks for retrieval: (i) extraction based only on video information using a fuzzy c-means algorithm; (ii) extraction based only on audio information using Gaussian Mixture Models; and (iii) a fusion strategy that combines video- and audio-based extraction. This paper presents the performance of various extraction techniques and the related retrieval performance in a known-item spoken document retrieval task.

引用

页码：724 / 727

页数：4

共 37 条

[31] Automatic tracing and extraction of text-line and word segments directly in JPEG compressed document images
Rajesh, Bulla
Javed, Mohammed
Nagabhushan, P.
IET IMAGE PROCESSING, 2020, 14 (09) : 1909 - 1919
[32] A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing
Anishka Gupta
Divakar Yadav
Multimedia Tools and Applications, 2021, 80 : 22209 - 22229
[33] A novel approach to perform context-based automatic spoken document retrieval of political speeches based on wavelet tree indexing
Gupta, Anishka
Yadav, Divakar
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (14) : 22209 - 22229
[34] A Review on Signature and Logo Identification and Extraction using Automatic Logo Based Document Image Retrieval Methods
Raveendra, K.
Reddy, P. V. N.
Kishore, P. V. V.
HELIX, 2018, 8 (01): : 2726 - 2729
[35] A new multimedia content skimming technique at arbitrary user-set rate based on automatic speech emphasis extraction
Hidaka, Kota
Nakajima, Shinya
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2007, 23 (1-2) : 115 - 129
[36] Automatic bi-modal emotion recognition system based on fusion of facial expressions and emotion extraction from speech
Datcu, Dragos
Rothkrantz, Leon J. M.
2008 8TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2008), VOLS 1 AND 2, 2008, : 606 - 607
[37] Improving detection of Alzheimer's Disease using automatic speech recognition to identify high-quality segments for more robust feature extraction
Pan, Yilin
Mirheidari, Bahman
Reuber, Markus
Venneri, Annalena
Blackburn, Daniel
Christensen, Heidi
INTERSPEECH 2020, 2020, : 4961 - 4965

← 1 2 3 4 →