Factual Consistency Oriented Speech Recognition

被引：1

作者：

Kanda, Naoyuki ^{[1
]}

Yoshioka, Takuya ^{[1
]}

Liu, Yang ^{[1
]}

机构：

[1] Microsoft, Redmond, WA 98052 USA

来源：

INTERSPEECH 2023 | 2023年

关键词：

speech recognition; speech summarization; hallucination errors; ASR;

D O I：

10.21437/Interspeech.2023-485

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a novel optimization framework for automatic speech recognition (ASR) with the aim of reducing hallucinations produced by an ASR model. The proposed framework optimizes the ASR model to maximize an expected factual consistency score between ASR hypotheses and groundtruth transcriptions, where the factual consistency score is computed by a separately trained estimator. Experimental results using the AMI meeting corpus and the VoxPopuli corpus show that the ASR model trained with the proposed framework generates ASR hypotheses that have significantly higher consistency scores with ground-truth transcriptions while maintaining the word error rates close to those of cross entropy-trained ASR models. Furthermore, it is shown that training the ASR models with the proposed framework improves the speech summarization quality as measured by the factual consistency of meeting conversation summaries generated by a large language model.

引用

页码：236 / 240

页数：5

共 42 条

[1]

Amodei D, 2016, PR MACH LEARN RES, V48

[2]

Banerjee S., 2005, P ACL WORKSHOP INTRI, P65, DOI DOI 10.3115/1626355.1626389

[3]

Brown T., 2020, Advances in Neural Information Processing Systems, P1877, DOI [10.48550/ARXIV.2005.14165, DOI 10.48550/ARXIV.2005.14165, 10.48550/arXiv.2005.14165]

[4]

Cao M, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P6251

[5]

Carletta J, 2005, LECT NOTES COMPUT SC, V3869, P28

[6]

Chorowski J., 2014, P NIPS WORKSH DEEP L

[7]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[8]

Godfrey J. J., 1992, ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech and Signal Processing (Cat. No.92CH3103-9), P517, DOI 10.1109/ICASSP.1992.225858

[9]

Graves A., 2006, P 23 INT C MACHINE L, P369, DOI DOI 10.1145/1143844.1143891

[10]

Graves A, 2012, Arxiv, DOI arXiv:1211.3711

← 1 2 3 4 5 →