Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction

被引：0

作者：

Campos-Soberanis, Mario ^{[1
]}

Campos-Sobrino, Diego ^{[1
]}

Viana-Camara, Rafael ^{[1
]}

机构：

[1] SoldAI Res, Merida, Yucatan, Mexico

来源：

ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II | 2021年 / 13068卷

关键词：

Automatic speech recognition; Phonetic correction; Neural networks; Named entity recognition;

D O I：

10.1007/978-3-030-89820-5_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article describes the successful implementation of a conversational speech recognition system applied to telephonic sales performed by an autonomous agent. Our implementation uses a post-processing corrector based on phonetic representations of text and subsequent neural network classifier. The classifier assesses the proposed correction's relevance to reduce the errors in the transcript sent to a downstream Natural Language Understanding engine. The experiments were carried on correcting transcripts from real audios of orders placed by customers of a large bottling company. We measured the Word Error Rate of the corrected transcripts against human-annotated ground-truth to verify the improvement produced by the system. To evaluate the corrections' impact on the entities detected by the Natural Language Understanding engine, we used Jaccard distance, Precision, Recall, and F-1. Results show that the implemented system and architecture enhance the transcript relative Word Error Rate on a 39% and Jaccard distance on 13% in comparison to the Automatic Speech Recognition baseline, making them suitable for real-time telephonic sales systems implementation.

引用

页码：46 / 58

页数：13

共 50 条

[31] Speech Recognition System Using Open-Source Speech Engine for Indian Names
Kallole, Nitin Arun
Prakash, R.
INTELLIGENT EMBEDDED SYSTEMS, ICNETS2, VOL II, 2018, 492 : 263 - 274
[32] Joint Adversarial Training of Speech Recognition and Synthesis Models for Many-to-One Voice Conversion Using Phonetic Posteriorgrams
Saito, Yuki
Akuzawa, Kei
Tachibana, Kentaro
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (09) : 1978 - 1987
[33] Dysarthric Speech Recognition Using Variational Mode Decomposition and Convolutional Neural Networks
Rajeswari, R.
Devi, T.
Shalini, S.
WIRELESS PERSONAL COMMUNICATIONS, 2022, 122 (01) : 293 - 307
[34] Dysarthric Speech Recognition Using Variational Mode Decomposition and Convolutional Neural Networks
R. Rajeswari
T. Devi
S. Shalini
Wireless Personal Communications, 2022, 122 : 293 - 307
[35] Robust Noisy Speech Recognition Using Deep Neural Support Vector Machines
Amami, Rimah
Ben Ayed, Dorra
DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, 800 : 300 - 307
[36] Face Recognition system in Android Using Neural Networks
Stoimenov, Stoimen
Tsenov, Georgi T.
Mladenov, Valeri M.
2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 125 - 128
[37] Understanding speech recognition using correlation-generated neural network targets
Yan, YH
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (03): : 350 - 352
[38] Audio-visual speech recognition using red exclusion and neural networks
Lewis, TW
Powers, DMW
JOURNAL OF RESEARCH AND PRACTICE IN INFORMATION TECHNOLOGY, 2003, 35 (01): : 41 - 64
[39] A Binaural Deep Neural Networks Parameter Mask for the Robust Automatic Speech Recognition System
Jiang, Yi
Liu, Runsheng
2016 INTERNATIONAL CONFERENCE ON NETWORK AND INFORMATION SYSTEMS FOR COMPUTERS (ICNISC), 2016, : 352 - 356
[40] Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures
Moore, A. H.
Parada, P. Peso
Naylor, P. A.
COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 574 - 584

← 1 2 3 4 5 →