Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction

被引:0
作者
Campos-Soberanis, Mario [1 ]
Campos-Sobrino, Diego [1 ]
Viana-Camara, Rafael [1 ]
机构
[1] SoldAI Res, Merida, Yucatan, Mexico
来源
ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II | 2021年 / 13068卷
关键词
Automatic speech recognition; Phonetic correction; Neural networks; Named entity recognition;
D O I
10.1007/978-3-030-89820-5_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article describes the successful implementation of a conversational speech recognition system applied to telephonic sales performed by an autonomous agent. Our implementation uses a post-processing corrector based on phonetic representations of text and subsequent neural network classifier. The classifier assesses the proposed correction's relevance to reduce the errors in the transcript sent to a downstream Natural Language Understanding engine. The experiments were carried on correcting transcripts from real audios of orders placed by customers of a large bottling company. We measured the Word Error Rate of the corrected transcripts against human-annotated ground-truth to verify the improvement produced by the system. To evaluate the corrections' impact on the entities detected by the Natural Language Understanding engine, we used Jaccard distance, Precision, Recall, and F-1. Results show that the implemented system and architecture enhance the transcript relative Word Error Rate on a 39% and Jaccard distance on 13% in comparison to the Automatic Speech Recognition baseline, making them suitable for real-time telephonic sales systems implementation.
引用
收藏
页码:46 / 58
页数:13
相关论文
共 50 条
  • [1] THE MICROSOFT 2017 CONVERSATIONAL SPEECH RECOGNITION SYSTEM
    Xiong, W.
    Wu, L.
    Alleva, F.
    Droppo, J.
    Huang, X.
    Stolcke, A.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5934 - 5938
  • [2] Improving Recognition of Speech System Using Multimodal Approach
    Radha, N.
    Shahina, A.
    Khan, A. Nayeemulla
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 397 - 410
  • [3] Cross-sentence Neural Language Models for Conversational Speech Recognition
    Chiu, Shih-Hsuan
    Lo, Tien-Hong
    Chen, Berlin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] Improving clinical named entity recognition in Chinese using the graphical and phonetic feature
    Yifei Wang
    Sophia Ananiadou
    Jun’ichi Tsujii
    BMC Medical Informatics and Decision Making, 19
  • [5] Improving clinical named entity recognition in Chinese using the graphical and phonetic feature
    Wang, Yifei
    Ananiadou, Sophia
    Tsujii, Jun'ichi
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (Suppl 7)
  • [6] Development of Hindi speech recognition system of agricultural commodities using deep neural network
    Mandal, Partho
    Jain, Shalini
    Ojha, Gaurav
    Shukla, Anupam
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1241 - 1245
  • [7] Recognition and Processing of Speech Signals Using Neural Networks
    Douglas O’Shaughnessy
    Circuits, Systems, and Signal Processing, 2019, 38 : 3454 - 3481
  • [8] Recognition and Processing of Speech Signals Using Neural Networks
    O'Shaughnessy, Douglas
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (08) : 3454 - 3481
  • [9] Using broad phonetic group-experts for improved speech recognition
    Scanlon, Patricia
    Ellis, Daniel P. W.
    Reilly, Richard B.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 803 - 812
  • [10] SPEECH RECOGNITION USING NEURAL NETWORKS
    Kumar, T. Lalith
    Kumar, T. Kishore
    Rajan, K. Soundar
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2009, : 248 - +