Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction

被引:0
作者
Campos-Soberanis, Mario [1 ]
Campos-Sobrino, Diego [1 ]
Viana-Camara, Rafael [1 ]
机构
[1] SoldAI Res, Merida, Yucatan, Mexico
来源
ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II | 2021年 / 13068卷
关键词
Automatic speech recognition; Phonetic correction; Neural networks; Named entity recognition;
D O I
10.1007/978-3-030-89820-5_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article describes the successful implementation of a conversational speech recognition system applied to telephonic sales performed by an autonomous agent. Our implementation uses a post-processing corrector based on phonetic representations of text and subsequent neural network classifier. The classifier assesses the proposed correction's relevance to reduce the errors in the transcript sent to a downstream Natural Language Understanding engine. The experiments were carried on correcting transcripts from real audios of orders placed by customers of a large bottling company. We measured the Word Error Rate of the corrected transcripts against human-annotated ground-truth to verify the improvement produced by the system. To evaluate the corrections' impact on the entities detected by the Natural Language Understanding engine, we used Jaccard distance, Precision, Recall, and F-1. Results show that the implemented system and architecture enhance the transcript relative Word Error Rate on a 39% and Jaccard distance on 13% in comparison to the Automatic Speech Recognition baseline, making them suitable for real-time telephonic sales systems implementation.
引用
收藏
页码:46 / 58
页数:13
相关论文
共 50 条
  • [41] Speech recognition system of transformer improved by pre-parallel convolution Neural Network
    Yue, Qi
    Han, Zhan
    Chu, Jing
    Han, Xiaokai
    Li, Peiwen
    Deng, Xuhui
    PROCEEDINGS OF 2022 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (IEEE ICMA 2022), 2022, : 928 - 933
  • [42] Speech Recognition System using Burg Method and Discrete Wavelet Transform
    Maazouzi, A.
    Laaroussi, A.
    Aqili, N.
    Raji, M.
    Hammouch, A.
    2016 INTERNATIONAL CONFERENCE ON ELECTRICAL AND INFORMATION TECHNOLOGIES (ICEIT), 2016, : 250 - 254
  • [43] Performance evaluation of Hindi speech recognition system using optimized filterbanks
    Dua, Mohit
    Aggarwal, Rajesh Kumar
    Biswas, Mantosh
    ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2018, 21 (03): : 389 - 398
  • [44] An efficient speech recognition system in adverse conditions using the nonparametric regression
    Amrouche, Abderrahmane
    Debyeche, Mohamed
    Taleb-Ahmed, Abdelmalik
    Rouvaen, Jean Michel
    Yagoub, Mustapha C. E.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2010, 23 (01) : 85 - 94
  • [45] Research on speech emotion recognition in E-Learning by using neural networks method
    Zhang, Qian
    Wang, Yan
    Wang, Lan
    Wang, Guoqiang
    2007 IEEE INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION, VOLS 1-7, 2007, : 715 - 718
  • [46] NOISE ROBUST SPEECH RECOGNITION USING RECENT DEVELOPMENTS IN NEURAL NETWORKS FOR COMPUTER VISION
    Yoshioka, Takuya
    Ohnishi, Katsunori
    Fang, Fuming
    Nakatani, Toniohiro
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5730 - 5734
  • [47] Semi-Supervised Learning for Spanish Speech Recognition Using Deep Neural Networks
    Rosario Campomanes-Alvarez, Blanca
    Quiros, Pelayo
    Fernandez, Bernardo
    APPLICATIONS OF INTELLIGENT SYSTEMS, 2018, 310 : 19 - 29
  • [48] Feature extraction using pulse-coupled neural network in isolated speech recognition
    Jurečka, Matúš
    Komunikacie, 2006, 8 (03): : 33 - 36
  • [49] Multilingual low resource Indian language speech recognition and spell correction using Indic BERT
    Priya, M. C. Shunmuga
    Renuka, D. Karthika
    Kumar, L. Ashok
    Rose, S. Lovelyn
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2022, 47 (04):
  • [50] Speech Emotion Recognition based on AttentionWeight Correction Using Word-level Confidence Measure
    Santoso, Jennifer
    Yamada, Takeshi
    Makino, Shoji
    Ishizuka, Kenkichi
    Hiramura, Takekatsu
    INTERSPEECH 2021, 2021, : 1947 - 1951