Improving Readability for Automatic Speech Recognition Transcription

被引:5
|
作者
Liao, Junwei [1 ]
Eskimez, Sefik [2 ]
Lu, Liyang [2 ]
Shi, Yu [2 ]
Gong, Ming [3 ]
Shou, Linjun [3 ]
Qu, Hong [1 ]
Zeng, Michael [2 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Microsoft Speech & Dialogue Res Grp, New York, NY USA
[3] Microsoft STCA NLP Grp, Beijing, Peoples R China
关键词
Automatic speech recognition; post-processing for readability; data synthesis; pre-trained model; PUNCTUATION; CAPITALIZATION;
D O I
10.1145/3557894
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other noises common in spoken communication. These readable issues introduced by speakers and ASR systems will impair the performance of downstream tasks and the understanding of human readers. In thiswork, we present a task called ASR post-processing for readability (APR) and formulate it as a sequenceto-sequence text generation problem. The APR task aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of speakers. We further study the APR task from the benchmark dataset, evaluation metrics, and baseline models: First, to address the lack of task-specific data, we propose a method to construct a dataset for the APR task by using the data collected for grammatical error correction. Second, we utilize metrics adapted or borrowed from similar tasks to evaluate model performance on the APR task. Lastly, we use several typical or adapted pre-trained models as the baseline models for the APR task. Furthermore, we fine-tune the baseline models on the constructed dataset and compare their performance with a traditional pipeline method in terms of proposed evaluation metrics. Experimental results show that all the fine-tuned baseline models perform better than the traditional pipeline method, and our adapted RoBERTa model outperforms the pipeline method by 4.95 and 6.63 BLEU points on two test sets, respectively. The human evaluation and case study further reveal the ability of the proposed model to improve the readability of ASR transcripts.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] DYSLEXIC CHILDREN'S READING: BUILDING A SPEECH RECOGNITION ENGINE USING AUTOMATIC TRANSCRIPTION AND LABELING
    Husni, Husniza
    Yusof, Yuhanis
    Kamaruddin, Siti Sakira
    Him, Nik Nurhidayat Nik
    ICERI2015: 8TH INTERNATIONAL CONFERENCE OF EDUCATION, RESEARCH AND INNOVATION, 2015, : 2360 - 2366
  • [22] Evaluation of an Automatic Speech Recognition Platform for Dysarthric Speech
    Calvo, Irene
    Tropea, Peppino
    Vigano, Mauro
    Scialla, Maria
    Cavalcante, Agnieszka B.
    Grajzer, Monika
    Gilardone, Marco
    Corbo, Massimo
    FOLIA PHONIATRICA ET LOGOPAEDICA, 2021, 73 (05) : 432 - 441
  • [23] On the Influence of Automatic Segmentation and Clustering in Automatic Speech Recognition
    Lopez-Otero, Paula
    Docio-Fernandez, Laura
    Garcia-Mateo, Carmen
    Cardenal-Lopez, Antonio
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, 2012, 328 : 49 - 58
  • [24] Towards Automatic Assessment of Aphasia Speech Using Automatic Speech Recognition Techniques
    Qin, Ying
    Lee, Tan
    Kong, Anthony Pak Hin
    Law, Sam Po
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [25] Continual Learning in Automatic Speech Recognition
    Sadhu, Samik
    Hermansky, Hynek
    INTERSPEECH 2020, 2020, : 1246 - 1250
  • [26] The WaveSurfer Automatic Speech Recognition Plugin
    Salvi, Giampiero
    Vanhainen, Niklas
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3067 - 3071
  • [27] Arabic Automatic Speech Recognition Enhancement
    Ahmed, Basem H. A.
    Ghabayen, Ayman S.
    2017 PALESTINIAN INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (PICICT), 2017, : 98 - 102
  • [28] Automatic speech recognition in neurodegenerative disease
    Benjamin G. Schultz
    Venkata S. Aditya Tarigoppula
    Gustavo Noffs
    Sandra Rojas
    Anneke van der Walt
    David B. Grayden
    Adam P. Vogel
    International Journal of Speech Technology, 2021, 24 : 771 - 779
  • [29] Graphical models and automatic speech recognition
    Bilmes, JA
    MATHEMATICAL FOUNDATIONS OF SPEECH AND LANGUAGE PROCESSING, 2004, 138 : 191 - 245
  • [30] Automatic speech recognition in neurodegenerative disease
    Schultz, Benjamin G.
    Tarigoppula, Venkata S. Aditya
    Noffs, Gustavo
    Rojas, Sandra
    van der Walt, Anneke
    Grayden, David B.
    Vogel, Adam P.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (03) : 771 - 779