CrossASR: Efficient Differential Testing of Automatic Speech Recognition via Text-To-Speech

被引：14

作者：

Asyrofi, Muhammad Hilmi ^{[1
]}

Thung, Ferdian ^{[1
]}

Lo, David ^{[1
]}

Jiang, Lingxiao ^{[1
]}

机构：

[1] Singapore Management Univ, Sch Informat Syst, Singapore, Singapore

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2020) | 2020年

关键词：

Automatic Speech Recognition; Text-to-Speech; Test Case Generation; Differential Testing; Failure Probability Predictor;

D O I：

10.1109/ICSME46990.2020.00066

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Automatic speech recognition (ASR) systems are ubiquitous parts of modern life. It can be found in our smartphones, desktops, and smart home systems. To ensure its correctness in recognizing speeches, ASR needs to be tested. Testing ASR requires test cases in the form of audio files and their transcribed texts. Building these test cases manually, however, is tedious and time-consuming. To deal with the aforementioned challenge, in this work, we propose CrossASR, an approach that capitalizes the existing Text-To-Speech (TTS) systems to automatically generate test cases for ASR systems. CrossASR is a differential testing solution that compares outputs of multiple ASR systems to uncover erroneous behaviors among ASRs. CrossASR efficiently generates test cases to uncover failures with as few generated tests as possible; it does so by employing a failure probability predictor to pick the texts with the highest likelihood of leading to failed test cases. As a black-box approach, CrossASR can generate test cases for any ASR, including when the ASR model is not available (e.g., when evaluating the reliability of various third-party ASR services). We evaluated CrossASR using 4 TTSes and 4 ASRs on the Europarl corpus. The experimented ASRs are Deepspeech, Deepspeech2, wav2letter, and wit. Our experiments on a randomly sampled 20,000 English texts showed that within an hour, CrossASR can produce, on average from 3 experiments, 130.34, 123.33, 47.33, and 8.66 failed test cases using Google, Responsive Voice, Festival, and Espeak TTSes, respectively. Moreover, when we run CrossASR on the entire 20,000 texts, it can generate 13,572, 13,071, 5,911, and 1,064 failed test cases using Google, ResponsiveVoice, Festival, and Espeak TTSes, respectively. Based on a manual verification carried out on statistically representative sample size, we found that most samples are actual failed test cases (audio understandable to humans but cannot be transcribed properly by an ASR), demonstrating that CrossASR is highly reliable in determining failed test cases. We also make the source code for CrossASR and evaluation data available at https://github.com/soarsmu/CrossASR.

引用

页码：640 / 650

页数：11

共 36 条

[1]

Amodei D, 2016, PR MACH LEARN RES, V48

[2]

Begel A, 2006, IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING, PROCEEDINGS, P116

[3]

Blue L., 2019, ABS191005262 ARXIV

[4]

Buttery Paula, 2017, P 3 WORKSHOP NOISY U, P107, DOI DOI 10.18653/V1/W17-4414

[5] Audio Adversarial Examples: Targeted Attacks on Speech-to-Text [J].

Carlini, Nicholas ;

Wagner, David .

2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, :1-7

[6] Mutation Testing and Test Data Generation Approaches: A Review [J].

Dave, Meenu ;

Agrawal, Rashmi .

SMART TRENDS IN INFORMATION TECHNOLOGY AND COMPUTER COMMUNICATIONS, SMARTCOM 2016, 2016, 628 :373-382

[7]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[8] DeepStellar: Model-Based Quantitative Analysis of Stateful Deep Learning Systems [J].

Du, Xiaoning ;

Xie, Xiaofei ;

Li, Yi ;

Ma, Lei ;

Liu, Yang ;

Zhao, Jianjun .

ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, :477-487

[9]

Du Xinxin., 2018, arXiv

[10] DLFuzz: Differential Fuzzing Testing of Deep Learning Systems [J].

Guo, Jianmin ;

Jiang, Yu ;

Zhao, Yue ;

Chen, Quan ;

Sun, Jiaguang .

ESEC/FSE'18: PROCEEDINGS OF THE 2018 26TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2018, :739-743

← 1 2 3 4 →