Improved dynamic match phone lattice search for Persian spoken term detection system in online and offline applications

被引:0
作者
Shima Tabibian
Ahmad Akbari
Babak Nasersharif
机构
[1] Shahid Beheshti University,Cyberspace Research Institute
[2] Iran University of Science and Technology,Audio and Speech Processing Lab, Computer Engineering Department
[3] K.N. Toosi University of Technology,Computer Engineering Department
来源
International Journal of Speech Technology | 2019年 / 22卷
关键词
Spoken term detection; Phone lattice; Lattice search; Scoring; Distance measure;
D O I
暂无
中图分类号
学科分类号
摘要
Spoken term detection (STD) refers to discovering all occurrences of a given term in a set of speech utterances. One of the well-known approaches for the STD system is the phone lattice search (PLS) that produces a phone-based lattice of speech utterances. Since the accuracy of a phone recognizer affects the accuracy of the STD system, the PLS approach utilizes the minimum edit distance (MED) measure to compensate the phone recognizer errors. While this measure increases the detection rate, it also raises the false alarm rate. In this paper, we consider the PLS approach as the baseline. Then, we use Viterbi scoring and Jaro-Winkler similarity measure in order to decrease the false alarm rate. Since the proposed approach uses more techniques than the baseline approach, the search speed may decrease. To overcome this problem, we use lattice pruning and indexing techniques such as depth first search algorithm to increase the search speed in online and offline applications, respectively. We report the experimental results for monophone-based and triphone-based STD system. The results indicate that using triphone-based STD system improved the performance about 2% in comparison with monophone-based STD system. Moreover, when we used triphone-based models, the proposed approach including MED measure, Viterbi scores and Jaro-Winkler similarity measure improved the accuracy of the method with only MED measure, about 17%.
引用
收藏
页码:205 / 217
页数:12
相关论文
共 15 条
  • [1] Can D(2011)Lattice indexing for spoken term detection IEEE Transactions on Audio, Speech, and Language Processing 19 2338-2347
  • [2] Saraclar M(2008)Fast fuzzy keyword spotting using syllable confusion network indexing Chinese Journal of Electronics 17 265-270
  • [3] Shao J(2018)Discriminative keyword spotting using triphones information and N-best search Information Sciences 423 157-171
  • [4] Zhao Q(2007)Rapid yet accurate speech indexing using dynamic match lattice spotting IEEE Transactions on Audio, Speech, and Language Processing 15 346-357
  • [5] Zhang P(2009)Hybrid statistical pronunciation models designed to be trained by a medium-size corpus Computer Speech & Language 23 1-24
  • [6] Liu Z(undefined)undefined undefined undefined undefined-undefined
  • [7] Yan Y(undefined)undefined undefined undefined undefined-undefined
  • [8] Tabibian S(undefined)undefined undefined undefined undefined-undefined
  • [9] Akbari A(undefined)undefined undefined undefined undefined-undefined
  • [10] Nasersharif B(undefined)undefined undefined undefined undefined-undefined