MULTI-STYLE MLP FEATURES FOR BN TRANSCRIPTION

被引：6

作者：

Le, Viet-Bac ^{[1
]}

Lamel, Lori ^{[1
]}

Gauvain, Jean-Luc ^{[1
]}

机构：

[1] LIMSI CNRS, Spoken Language Proc Grp, F-91403 Orsay, France

来源：

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年

关键词：

MLP features; condition-specific adaptation; BN transcription;

D O I：

10.1109/ICASSP.2010.5495116

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

It has become common practice to adapt acoustic models to specific-conditions (gender, accent, bandwidth) in order to improve the performance of speech-to-text (STT) transcription systems. With the growing interest in the use of discriminative features produced by a multi layer perceptron (MLP) in such systems, the question arise of whether it is necessary to specialize the MLP to particular conditions, and if so, how to incorporate the condition-specific MLP features in the system. This paper explores three approaches (adaptation, full training, and feature merging) to use condition-specific MLP features in a state-of-the-art BN STT system for French. The third approach without condition-specific adaptation was found to outperform the original models with condition-specific adaptation, and was found to perform almost as well as full training of multiple condition-specific HMMs.

引用

页码：4866 / 4869

页数：4

共 50 条

[31] On Speech Features Fusion, α-Integration Gaussian Modeling and Multi-Style Training for Noise Robust Speaker Classification
Venturini, A.
Zao, L.
Coelho, R.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 1951 - 1964
[32] On the Use of MLP Features for Broadcast News Transcription
Fousek, Petr
Lamel, Lori
Gauvain, Jean-Luc
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 303 - 310
[33] Latent Style: multi-style image transfer via latent style coding and skip connection
Hu, Jingfei
Wu, Guang
Wang, Hua
Zhang, Jicong
SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (02) : 359 - 368
[34] Multi-Style Language Model for Web Scale Information Retrieval
Wang, Kuansan
Li, Xiaolong
Gao, Jianfeng
SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 467 - 474
[35] Multi-style Generative Network for Real-Time Transfer
Zhang, Hang
Dana, Kristin
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV, 2019, 11132 : 349 - 365
[36] Modeling multi-style portrait relief from a single photograph
Zhang, Yu-Wei
Yang, Hongguang
Luo, Ping
Li, Zhi
Liu, Hui
Ji, Zhongping
Zhang, Caiming
GRAPHICAL MODELS, 2023, 130
[37] Multi-style image transfer system using conditional cycleGAN
Tu, Ching-Ting
Lin, Hwei Jen
Tsia, Yihjia
IMAGING SCIENCE JOURNAL, 2021, 69 (1-4): : 1 - 14
[38] The Recognition of Bimodal Produced Speech based on Multi-style Training
Galic, Jovan
Markovic, Branko
2020 ZOOMING INNOVATION IN CONSUMER TECHNOLOGIES CONFERENCE (ZINC), 2020, : 11 - 14
[39] Cross-domain multi-style merge for image captioning
Duan, Yiqun
Wang, Zhen
Li, Yi
Wang, Jingya
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 228
[40] Logical entity recognition in multi-style document page images
Mao, Song
Xu, Zheng
Tjahjadi, Tardi
Thoma, George R.
18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 876 - +

← 1 2 3 4 5 →