Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction

被引:0
作者
Campos-Soberanis, Mario [1 ]
Campos-Sobrino, Diego [1 ]
Viana-Camara, Rafael [1 ]
机构
[1] SoldAI Res, Merida, Yucatan, Mexico
来源
ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II | 2021年 / 13068卷
关键词
Automatic speech recognition; Phonetic correction; Neural networks; Named entity recognition;
D O I
10.1007/978-3-030-89820-5_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article describes the successful implementation of a conversational speech recognition system applied to telephonic sales performed by an autonomous agent. Our implementation uses a post-processing corrector based on phonetic representations of text and subsequent neural network classifier. The classifier assesses the proposed correction's relevance to reduce the errors in the transcript sent to a downstream Natural Language Understanding engine. The experiments were carried on correcting transcripts from real audios of orders placed by customers of a large bottling company. We measured the Word Error Rate of the corrected transcripts against human-annotated ground-truth to verify the improvement produced by the system. To evaluate the corrections' impact on the entities detected by the Natural Language Understanding engine, we used Jaccard distance, Precision, Recall, and F-1. Results show that the implemented system and architecture enhance the transcript relative Word Error Rate on a 39% and Jaccard distance on 13% in comparison to the Automatic Speech Recognition baseline, making them suitable for real-time telephonic sales systems implementation.
引用
收藏
页码:46 / 58
页数:13
相关论文
共 50 条
  • [21] Bangla Short Speech Commands Recognition Using Convolutional Neural Networks
    Sumon, Shakil Ahmed
    Chowdhury, Joydip
    Debnath, Sujit
    Mohammed, Nabeel
    Momen, Sifat
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [22] Automatic speech recognition of Portuguese phonemes using neural networks ensemble
    Nedjah, Nadia
    Bonilla, Alejandra D.
    Mourelle, Luiza de Macedo
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 229
  • [23] Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
    Tanaka, Tomohiro
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Moriya, Takafumi
    Ashihara, Takanori
    Orihashi, Shota
    Makishima, Naoki
    INTERSPEECH 2021, 2021, : 4059 - 4063
  • [24] Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks
    Yu, Dong
    Deng, Li
    Seide, Frank
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 6 - 9
  • [25] Emotion Recognition from Speech using Spectrograms and Shallow Neural Networks
    Slimi, Anwer
    Hamroun, Mohamed
    Zrigui, Mounir
    Nicolas, Henri
    MOMM 2020: THE 18TH INTERNATIONAL CONFERENCE ON ADVANCES IN MOBILE COMPUTING & MULTIMEDIA, 2020, : 35 - 39
  • [26] Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling
    Mohit Dua
    R. K. Aggarwal
    Mantosh Biswas
    Neural Computing and Applications, 2019, 31 : 6747 - 6755
  • [27] Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling
    Dua, Mohit
    Aggarwal, R. K.
    Biswas, Mantosh
    NEURAL COMPUTING & APPLICATIONS, 2019, 31 (10) : 6747 - 6755
  • [28] Using Dialogue-Based Dynamic Language Models for Improving Speech Recognition
    Manuel Lucas-Cuesta, Juan
    Fernandez, Fernando
    Ferreiros, Javier
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2439 - 2442
  • [29] Leaves recognition system using a neural network
    Sekeroglu, Boran
    Inan, Yucel
    12TH INTERNATIONAL CONFERENCE ON APPLICATION OF FUZZY SYSTEMS AND SOFT COMPUTING, ICAFS 2016, 2016, 102 : 578 - 582
  • [30] Analysis of CNN-based Speech Recognition System using Raw Speech as Input
    Palaz, Dimitri
    Magimai-Doss, Mathew
    Collobert, Ronan
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 11 - 15