A FACTOR AUTOMATON APPROACH FOR THE FORCED ALIGNMENT OF LONG SPEECH RECORDINGS

被引:26
作者
Moreno, Pedro J. [1 ]
Alberti, Christopher [1 ]
机构
[1] Google Inc, Speech Res Grp, New York, NY 10011 USA
来源
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年
关键词
finite state transducers; speech alignment; speech recognition;
D O I
10.1109/ICASSP.2009.4960722
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper addresses the problem of aligning long speech recordings to their transcripts. Previous work has focused on using highly tuned language models trained on the transcripts to reduce the search space. In this paper we propose the use of a factor automaton, a well known method to represent all sub-strings from a string. This automaton encodes a highly constrained language model trained on the transcripts. We show competitive results with n-gram models in several testing scenarios. Preliminary experiments show perfect alignments at a reduced computational load and with a smaller memory footprint when compared to n-gram models.
引用
收藏
页码:4869 / 4872
页数:4
相关论文
共 12 条
  • [1] Allauzen C., 2004, P WORKSH INT APPR SP
  • [2] Allauzen C, 2007, LECT NOTES COMPUT SC, V4783, P11
  • [3] THE SMALLEST AUTOMATION RECOGNIZING THE SUBWORDS OF A TEXT
    BLUMER, A
    BLUMER, J
    HAUSSLER, D
    EHRENFEUCHT, A
    CHEN, MT
    SEIFERAS, J
    [J]. THEORETICAL COMPUTER SCIENCE, 1985, 40 (01) : 31 - 55
  • [4] TRANSDUCERS AND REPETITIONS
    CROCHEMORE, M
    [J]. THEORETICAL COMPUTER SCIENCE, 1986, 45 (01) : 63 - 86
  • [5] Crochemore M., 2003, Jewels of stringology
  • [6] HAZEN TJ, 2006, P INTERSPEECH
  • [7] Lightly supervised and unsupervised acoustic model training
    Lamel, L
    Gauvain, JL
    Adda, G
    [J]. COMPUTER SPEECH AND LANGUAGE, 2002, 16 (01) : 115 - 129
  • [8] MOHRI M, 2007, P INT C IMPL APPL AU
  • [9] Moreno Pedro J, 1998, P ICSLP
  • [10] SPROAT R, 2000, P ICSLP