Model-Based Unsupervised Spoken Term Detection with Spoken Queries

被引:20
|
作者
Chan, Chun-an [1 ]
Lee, Lin-shan [1 ]
机构
[1] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei 10617, Taiwan
关键词
Acoustic segment model; dynamic time warping; unsupervised spoken term detection; zero-resource; ALGORITHM; WORDS;
D O I
10.1109/TASL.2013.2248714
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a set of model-based approaches for unsupervised spoken term detection (STD) with spoken queries that requires neither speech recognition nor annotated data. This work shows the possibilities in migrating from DTW-based to model-based approaches for unsupervised STD. The proposed approach consists of three components: self-organizing models, query matching, and query modeling. To construct the self-organizing models, repeated patterns are captured and modeled using acoustic segment models (ASMs). In the query matching phase, a document state matching (DSM) approach is proposed to represent documents as ASM sequences, which are matched to the query frames. In this way, not only do the ASMs better model the signal distributions and time trajectories of speech, but the much-smaller number of states than frames for the documents leads to a much lower computational load. A novel duration-constrained Viterbi (DC-Vite) algorithm is further proposed for the above matching process to handle the speaking rate distortion problem. In the query modeling phase, a pseudo likelihood ratio (PLR) approach is proposed in the pseudo relevance feedback (PRF) framework. A likelihood ratio evaluated with query/anti-query HMMs trained with pseudo relevant/irrelevant examples is used to verify the detected spoken term hypotheses. The proposed framework demonstrates the usefulness of ASMs for STD in zero-resource settings and the potential of an instantly responding STD system using ASM indexing. The best performance is achieved by integrating DTW-based approaches into the rescoring steps in the proposed framework. Experimental results show an absolute 14.2% of mean average precision improvement with 77% CPU time reduction compared with the segmental DTW approach on a Mandarin broadcast news corpus. Consistent improvements were found on TIMIT and MediaEval 2011 Spoken Web Search corpus.
引用
收藏
页码:1330 / 1342
页数:13
相关论文
共 50 条
  • [1] TOWARD UNSUPERVISED MODEL-BASED SPOKEN TERM DETECTION WITH SPOKEN QUERIES WITHOUT ANNOTATED DATA
    Chan, Chun-an
    Chung, Cheng-Tao
    Kuo, Yu-Hsin
    Lee, Lin-shan
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8550 - 8554
  • [2] UNSUPERVISED SPOKEN TERM DETECTION WITH SPOKEN QUERIES BY MULTI-LEVEL ACOUSTIC PATTERNS WITH VARYING MODEL GRANULARITY
    Chung, Cheng-Tao
    Chan, Chun-an
    Lee, Lin-shan
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] Unsupervised Spoken-Term Detection with Spoken Queries Using Segment-based Dynamic Time Warping
    Chan, Chun-an
    Lee, Lin-Shan
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 693 - 696
  • [4] Unsupervised Hidden Markov Modeling of Spoken Queries for Spoken Term Detection without Speech Recognition
    Chan, Chun-an
    Lee, Lin-shan
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2152 - 2155
  • [5] INTEGRATING FRAME-BASED AND SEGMENT-BASED DYNAMIC TIME WARPING FOR UNSUPERVISED SPOKEN TERM DETECTION WITH SPOKEN QUERIES
    Chan, Chun-an
    Lee, Lin-shan
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5652 - 5655
  • [6] EFFECT OF PRONUNCIATIONS ON OOV QUERIES IN SPOKEN TERM DETECTION
    Can, Dogan
    Cooper, Erica
    Sethy, Abhinav
    White, Chris
    Ramabhadran, Bhuvana
    Saraclar, Murat
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3957 - +
  • [7] Unsupervised Query-by-example spoken term detection based on DPHMM tokenizer
    Cao Jiankai
    Zhang Lianhai
    2017 IEEE 2ND ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2017, : 1321 - 1325
  • [8] Spoken term detection based on DTW
    Hou J.
    Xie L.
    Yang P.
    Xiao X.
    Leung C.-C.
    Xu H.
    Wang L.
    Lü H.
    Ma B.
    Chng E.
    Li H.
    Xie, Lei (lxie@nwpu.edu.cn), 1600, Tsinghua University (57): : 18 - 23
  • [9] Audio Mining: Unsupervised Spoken Term Detection over an Audio Database
    Kumar, Kishore R.
    Sarkar, Sandipan
    Rengaswamy, Pradeep
    Rao, K. Sreenivasa
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 514 - 518
  • [10] Unsupervised Discovery of Structured Acoustic Tokens With Applications to Spoken Term Detection
    Chung, Cheng-Tao
    Lee, Lin-Shan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (02) : 394 - 405