Improving Readability for Automatic Speech Recognition Transcription

被引:5
|
作者
Liao, Junwei [1 ]
Eskimez, Sefik [2 ]
Lu, Liyang [2 ]
Shi, Yu [2 ]
Gong, Ming [3 ]
Shou, Linjun [3 ]
Qu, Hong [1 ]
Zeng, Michael [2 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Microsoft Speech & Dialogue Res Grp, New York, NY USA
[3] Microsoft STCA NLP Grp, Beijing, Peoples R China
关键词
Automatic speech recognition; post-processing for readability; data synthesis; pre-trained model; PUNCTUATION; CAPITALIZATION;
D O I
10.1145/3557894
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other noises common in spoken communication. These readable issues introduced by speakers and ASR systems will impair the performance of downstream tasks and the understanding of human readers. In thiswork, we present a task called ASR post-processing for readability (APR) and formulate it as a sequenceto-sequence text generation problem. The APR task aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of speakers. We further study the APR task from the benchmark dataset, evaluation metrics, and baseline models: First, to address the lack of task-specific data, we propose a method to construct a dataset for the APR task by using the data collected for grammatical error correction. Second, we utilize metrics adapted or borrowed from similar tasks to evaluate model performance on the APR task. Lastly, we use several typical or adapted pre-trained models as the baseline models for the APR task. Furthermore, we fine-tune the baseline models on the constructed dataset and compare their performance with a traditional pipeline method in terms of proposed evaluation metrics. Experimental results show that all the fine-tuned baseline models perform better than the traditional pipeline method, and our adapted RoBERTa model outperforms the pipeline method by 4.95 and 6.63 BLEU points on two test sets, respectively. The human evaluation and case study further reveal the ability of the proposed model to improve the readability of ASR transcripts.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Improving Speech Synthesis by Automatic Speech Recognition and Speech Discriminator
    Huang, Li-Yu
    Chen, Chia-Ping
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2024, 40 (01) : 189 - 200
  • [2] Automatic Speech Recognition Post-Processing for Readability: Task, Dataset and a Two-Stage Pre-Trained Approach
    Liao, Junwei
    Shi, Yu
    Xu, Yong
    IEEE ACCESS, 2022, 10 : 117053 - 117066
  • [3] Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech
    Song, Kaitao
    Wan, Teng
    Wang, Bixia
    Jiang, Huiqiang
    Qiu, Luna
    Xu, Jiahang
    Jiang, Liping
    Lou, Qun
    Yang, Yuqing
    Li, Dongsheng
    Wang, Xudong
    Qiu, Lili
    INTERSPEECH 2022, 2022, : 4820 - 4824
  • [4] The Effects of Automatic Speech Recognition Quality on Human Transcription Latency
    Gaur, Yashesh
    Lasecki, Walter S.
    Metze, Florian
    Bigham, Jeffrey P.
    13TH WEB FOR ALL CONFERENCE MONTREAL, CANADA 2016, 2016,
  • [5] Harmonicity based dereverberation for improving automatic speech recognition performance and speech intelligibility
    Kinoshita, K
    Nakatani, T
    Miyoshi, M
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2005, E88A (07) : 1724 - 1731
  • [6] Improving Deep Learning based Automatic Speech Recognition for Gujarati
    Raval, Deepang
    Pathak, Vyom
    Patel, Muktan
    Bhatt, Brijesh
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
  • [7] Algorithms for Automatic Accentuation and Transcription of Russian Texts in Speech Recognition Systems
    Yakovenko, Olga
    Bondarenko, Ivan
    Borovikova, Mariya
    Vodolazsky, Daniil
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 768 - 777
  • [8] Improving Automatic Speech Recognition for Mobile Learning of Mathematics Through Incremental Parsing
    Isaac, Marina
    Pfluegel, Eckhard
    Hunter, Gordon
    Denholm-Price, James
    Attanayake, Dilaksha
    Coter, Guillaume
    INTELLIGENT ENVIRONMENTS 2016, 2016, 21 : 217 - 226
  • [9] A STUDY ON BIAS-BASED SPEECH SIGNAL CONDITIONING TECHNIQUES FOR IMPROVING THE ROBUSTNESS OF AUTOMATIC SPEECH RECOGNITION
    Chowdhury, Md Foezur Rahman
    Selouani, Sid-Ahmed
    O'Shaughnessy, Douglas
    2009 IEEE 22ND CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1 AND 2, 2009, : 366 - +
  • [10] Improving Automatic Speech Recognition Through Head Pose Driven Visual Grounding
    Vosoughi, Soroush
    32ND ANNUAL ACM CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2014), 2014, : 3235 - 3238