Improving Readability for Automatic Speech Recognition Transcription

被引:5
作者
Liao, Junwei [1 ]
Eskimez, Sefik [2 ]
Lu, Liyang [2 ]
Shi, Yu [2 ]
Gong, Ming [3 ]
Shou, Linjun [3 ]
Qu, Hong [1 ]
Zeng, Michael [2 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Microsoft Speech & Dialogue Res Grp, New York, NY USA
[3] Microsoft STCA NLP Grp, Beijing, Peoples R China
关键词
Automatic speech recognition; post-processing for readability; data synthesis; pre-trained model; PUNCTUATION; CAPITALIZATION;
D O I
10.1145/3557894
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other noises common in spoken communication. These readable issues introduced by speakers and ASR systems will impair the performance of downstream tasks and the understanding of human readers. In thiswork, we present a task called ASR post-processing for readability (APR) and formulate it as a sequenceto-sequence text generation problem. The APR task aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of speakers. We further study the APR task from the benchmark dataset, evaluation metrics, and baseline models: First, to address the lack of task-specific data, we propose a method to construct a dataset for the APR task by using the data collected for grammatical error correction. Second, we utilize metrics adapted or borrowed from similar tasks to evaluate model performance on the APR task. Lastly, we use several typical or adapted pre-trained models as the baseline models for the APR task. Furthermore, we fine-tune the baseline models on the constructed dataset and compare their performance with a traditional pipeline method in terms of proposed evaluation metrics. Experimental results show that all the fine-tuned baseline models perform better than the traditional pipeline method, and our adapted RoBERTa model outperforms the pipeline method by 4.95 and 6.63 BLEU points on two test sets, respectively. The human evaluation and case study further reveal the ability of the proposed model to improve the readability of ASR transcripts.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] Refining maritime Automatic Speech Recognition by leveraging synthetic speech
    Martius, Christoph
    Nakilcioglu, Emin Cagatay
    Reimann, Maximilian
    John, Ole
    MARITIME TRANSPORT RESEARCH, 2024, 7
  • [42] Autonomous measurement of speech intelligibility utilizing automatic speech recognition
    Meyer, Bernd T.
    Kollmeier, Birger
    Ooster, Jasper
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2982 - 2986
  • [43] Chhattisgarhi speech corpus for research and development in automatic speech recognition
    Londhe, Narendra D.
    Kshirsagar, Ghanahshyam B.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (02) : 193 - 210
  • [44] KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition
    Bang, Jeong-Uk
    Yun, Seung
    Kim, Seung-Hi
    Choi, Mu-Yeol
    Lee, Min-Kyu
    Kim, Yeo-Jeong
    Kim, Dong-Hyun
    Park, Jun
    Lee, Young-Jik
    Kim, Sang-Hun
    APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 17
  • [45] Validation of Speech Data for Training Automatic Speech Recognition Systems
    Krizaj, Janes
    Gros, Jerneja Zganec
    Dobrisek, Simon
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1165 - 1169
  • [46] Automatic Speech Recognition: Systematic Literature Review
    Alharbi, Sadeen
    Alrazgan, Muna
    Alrashed, Alanoud
    Alnomasi, Turkiayh
    Almojel, Raghad
    Alharbi, Rimah
    Alharbi, Saja
    Alturki, Sahar
    Alshehri, Fatimah
    Almojil, Maha
    IEEE ACCESS, 2021, 9 : 131858 - 131876
  • [47] Automatic Speech Recognition in Diverse English Accents
    Mohyuddin, Hashir
    Kwak, Daehan
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 714 - 718
  • [48] Spectral Analysis for Automatic Speech Recognition and Enhancement
    Oruh, Jane
    Viriri, Serestina
    MACHINE LEARNING FOR NETWORKING, MLN 2020, 2021, 12629 : 245 - 254
  • [49] Automatic Speech Recognition in the professional translation process
    Ciobanu, Dragos
    TRANSLATION SPACES, 2016, 5 (01) : 124 - 144
  • [50] Automatic Speech Recognition: Do Emotions Matter?
    Catania, Fabio
    Crovari, Pietro
    Spitale, Micol
    Garzotto, Franca
    2019 IEEE INTERNATIONAL CONFERENCE ON CONVERSATIONAL DATA & KNOWLEDGE ENGINEERING (CDKE), 2019, : 9 - 16