MULTI-STYLE MLP FEATURES FOR BN TRANSCRIPTION

被引:6
|
作者
Le, Viet-Bac [1 ]
Lamel, Lori [1 ]
Gauvain, Jean-Luc [1 ]
机构
[1] LIMSI CNRS, Spoken Language Proc Grp, F-91403 Orsay, France
来源
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年
关键词
MLP features; condition-specific adaptation; BN transcription;
D O I
10.1109/ICASSP.2010.5495116
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
It has become common practice to adapt acoustic models to specific-conditions (gender, accent, bandwidth) in order to improve the performance of speech-to-text (STT) transcription systems. With the growing interest in the use of discriminative features produced by a multi layer perceptron (MLP) in such systems, the question arise of whether it is necessary to specialize the MLP to particular conditions, and if so, how to incorporate the condition-specific MLP features in the system. This paper explores three approaches (adaptation, full training, and feature merging) to use condition-specific MLP features in a state-of-the-art BN STT system for French. The third approach without condition-specific adaptation was found to outperform the original models with condition-specific adaptation, and was found to perform almost as well as full training of multiple condition-specific HMMs.
引用
收藏
页码:4866 / 4869
页数:4
相关论文
共 50 条
  • [31] On Speech Features Fusion, α-Integration Gaussian Modeling and Multi-Style Training for Noise Robust Speaker Classification
    Venturini, A.
    Zao, L.
    Coelho, R.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 1951 - 1964
  • [32] On the Use of MLP Features for Broadcast News Transcription
    Fousek, Petr
    Lamel, Lori
    Gauvain, Jean-Luc
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 303 - 310
  • [33] Latent Style: multi-style image transfer via latent style coding and skip connection
    Hu, Jingfei
    Wu, Guang
    Wang, Hua
    Zhang, Jicong
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (02) : 359 - 368
  • [34] Multi-Style Language Model for Web Scale Information Retrieval
    Wang, Kuansan
    Li, Xiaolong
    Gao, Jianfeng
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 467 - 474
  • [35] Multi-style Generative Network for Real-Time Transfer
    Zhang, Hang
    Dana, Kristin
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV, 2019, 11132 : 349 - 365
  • [36] Modeling multi-style portrait relief from a single photograph
    Zhang, Yu-Wei
    Yang, Hongguang
    Luo, Ping
    Li, Zhi
    Liu, Hui
    Ji, Zhongping
    Zhang, Caiming
    GRAPHICAL MODELS, 2023, 130
  • [37] Multi-style image transfer system using conditional cycleGAN
    Tu, Ching-Ting
    Lin, Hwei Jen
    Tsia, Yihjia
    IMAGING SCIENCE JOURNAL, 2021, 69 (1-4): : 1 - 14
  • [38] The Recognition of Bimodal Produced Speech based on Multi-style Training
    Galic, Jovan
    Markovic, Branko
    2020 ZOOMING INNOVATION IN CONSUMER TECHNOLOGIES CONFERENCE (ZINC), 2020, : 11 - 14
  • [39] Cross-domain multi-style merge for image captioning
    Duan, Yiqun
    Wang, Zhen
    Li, Yi
    Wang, Jingya
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 228
  • [40] Logical entity recognition in multi-style document page images
    Mao, Song
    Xu, Zheng
    Tjahjadi, Tardi
    Thoma, George R.
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 876 - +