Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction

被引：0

作者：

Campos-Soberanis, Mario ^{[1
]}

Campos-Sobrino, Diego ^{[1
]}

Viana-Camara, Rafael ^{[1
]}

机构：

[1] SoldAI Res, Merida, Yucatan, Mexico

来源：

ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II | 2021年 / 13068卷

关键词：

Automatic speech recognition; Phonetic correction; Neural networks; Named entity recognition;

D O I：

10.1007/978-3-030-89820-5_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article describes the successful implementation of a conversational speech recognition system applied to telephonic sales performed by an autonomous agent. Our implementation uses a post-processing corrector based on phonetic representations of text and subsequent neural network classifier. The classifier assesses the proposed correction's relevance to reduce the errors in the transcript sent to a downstream Natural Language Understanding engine. The experiments were carried on correcting transcripts from real audios of orders placed by customers of a large bottling company. We measured the Word Error Rate of the corrected transcripts against human-annotated ground-truth to verify the improvement produced by the system. To evaluate the corrections' impact on the entities detected by the Natural Language Understanding engine, we used Jaccard distance, Precision, Recall, and F-1. Results show that the implemented system and architecture enhance the transcript relative Word Error Rate on a 39% and Jaccard distance on 13% in comparison to the Automatic Speech Recognition baseline, making them suitable for real-time telephonic sales systems implementation.

引用

页码：46 / 58

页数：13

共 50 条

[21] Bangla Short Speech Commands Recognition Using Convolutional Neural Networks
Sumon, Shakil Ahmed
Chowdhury, Joydip
Debnath, Sujit
Mohammed, Nabeel
Momen, Sifat
2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
[22] Automatic speech recognition of Portuguese phonemes using neural networks ensemble
Nedjah, Nadia
Bonilla, Alejandra D.
Mourelle, Luiza de Macedo
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 229
[23] Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Tanaka, Tomohiro
Masumura, Ryo
Ihori, Mana
Takashima, Akihiko
Moriya, Takafumi
Ashihara, Takanori
Orihashi, Shota
Makishima, Naoki
INTERSPEECH 2021, 2021, : 4059 - 4063
[24] Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks
Yu, Dong
Deng, Li
Seide, Frank
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 6 - 9
[25] Emotion Recognition from Speech using Spectrograms and Shallow Neural Networks
Slimi, Anwer
Hamroun, Mohamed
Zrigui, Mounir
Nicolas, Henri
MOMM 2020: THE 18TH INTERNATIONAL CONFERENCE ON ADVANCES IN MOBILE COMPUTING & MULTIMEDIA, 2020, : 35 - 39
[26] Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling
Mohit Dua
R. K. Aggarwal
Mantosh Biswas
Neural Computing and Applications, 2019, 31 : 6747 - 6755
[27] Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling
Dua, Mohit
Aggarwal, R. K.
Biswas, Mantosh
NEURAL COMPUTING & APPLICATIONS, 2019, 31 (10) : 6747 - 6755
[28] Using Dialogue-Based Dynamic Language Models for Improving Speech Recognition
Manuel Lucas-Cuesta, Juan
Fernandez, Fernando
Ferreiros, Javier
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2439 - 2442
[29] Leaves recognition system using a neural network
Sekeroglu, Boran
Inan, Yucel
12TH INTERNATIONAL CONFERENCE ON APPLICATION OF FUZZY SYSTEMS AND SOFT COMPUTING, ICAFS 2016, 2016, 102 : 578 - 582
[30] Analysis of CNN-based Speech Recognition System using Raw Speech as Input
Palaz, Dimitri
Magimai-Doss, Mathew
Collobert, Ronan
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 11 - 15

← 1 2 3 4 5 →