PYCHAIN: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR

被引:9
作者
Shao, Yiwen [1 ]
Wang, Yiming [1 ]
Povey, Daniel [3 ]
Khudanpur, Sanjeev [1 ,2 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD USA
[3] Xiaomi Corp, Beijing, Peoples R China
来源
INTERSPEECH 2020 | 2020年
关键词
end-to-end speech recognition; lattice-free MMI; PyTorch; Kaldi; SPEECH RECOGNITION;
D O I
10.21437/Interspeech.2020-3053
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We present PYCHAIN, a fully parallelized PyTorch implementation of end-to-end lattice-free maximum mutual information (LF-MMI) training for the so-called chain models in the Kaldi automatic speech recognition (ASR) toolkit. Unlike other PyTorch and Kaldi based ASR toolkits, PYCHAIN is designed to be as flexible and light-weight as possible so that it can be easily plugged into new ASR projects, or other existing PyTorch-based ASR tools, as exemplified respectively by a new project PYCHAIN-EXAMPLE, and ESPRESSO, an existing end-to-end ASR toolkit. PYCHAIN's efficiency and flexibility is demonstrated through such novel features as full GPU training on numerator/denominator graphs, and support for unequal length sequences. Experiments on the WSJ dataset show that with simple neural networks and commonly used machine learning techniques, PYCHAIN can achieve competitive results that are comparable to Kaldi and better than other end-to-end ASR systems.
引用
收藏
页码:561 / 565
页数:5
相关论文
共 38 条
  • [1] Abadi M., 2016, 12 F USENIX G S OPER
  • [2] Allauzen C, 2007, LECT NOTES COMPUT SC, V4783, P11
  • [3] Amodei D, 2016, PR MACH LEARN RES, V48
  • [4] An K., 2019, ARXIV
  • [5] [Anonymous], 2015, P INT C MACH LEARN
  • [6] [Anonymous], 2014, NIPS
  • [7] [Anonymous], 2005, THESIS
  • [8] Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
  • [9] Baskar M. K., 2019, P ICASSP
  • [10] Bengio Y., 2009, J AM PODIATRY ASS, P41, DOI [DOI 10.1145/1553374.15533802,5, 10.1145/1553374.1553380, DOI 10.1145/1553374.1553380]