Castsearch - Context based spoken document retrieval

被引:0
作者
Molgaard, Lasse Lohilahti [1 ]
Jorgensen, Kasper Winther [1 ]
Hansen, Lars Kai [1 ]
机构
[1] Tech Univ Denmark Richard Petersens Plads, Bldg 321, DK-2800 Lyngby, Denmark
来源
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 | 2007年
关键词
audio retrieval; document clustering; non-negative matrix factorization; text mining;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The paper describes our work on the development of a system for retrieval of relevant stories from broadcast news. The system utilizes a combination of audio processing and text mining. The audio processing consists of a segmentation step that partitions the audio into speech and music. The speech is further segmented into speaker segments and then transcribed using an automatic speech recognition system, to yield text input for clustering using non-negative matrix factorization (NMF). We find semantic topics that are used to evaluate the performance for topic detection. Based on these topics we show that a novel query expansion can be performed to return more intelligent search results. We also show that the query expansion helps overcome errors of the automatic transcription.
引用
收藏
页码:93 / +
页数:2
相关论文
共 9 条
[1]  
Allan J, 2002, TOPIC DETECTION TRAC
[2]   SpeechFind: Advances in spoken document retrieval for a National Gallery of the Spoken Word [J].
Hansen, JH ;
Huang, RQ ;
Zhou, B ;
Seadle, M ;
Deller, JR ;
Gurijala, AR ;
Kurimo, M ;
Angkititrakul, P .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05) :712-730
[3]   Probabilistic latent semantic indexing [J].
Hofmann, T .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :50-57
[4]   Routing and wavelength assignment in GMPLS networks [J].
Hua, Y ;
Xu, W ;
Wu, CL .
PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT'2003, PROCEEDINGS, 2003, :268-271
[5]  
Jorgensen K. W., 2006, P EUSIPCO
[6]  
JORGENSEN KW, 2006, THESIS TU DENMARK
[7]  
Lin Ch-J., 2005, PROJECTED GRADIENT M
[8]   Document clustering using nonnegative matrix factorization/ [J].
Shahnaz, F ;
Berry, MW ;
Pauca, VP ;
Plemmons, RJ .
INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (02) :373-386
[9]  
WALKER W, 2004, TR2004127 SUN MICR