Synthetic Speech Detection Based on the Temporal Consistency of Speaker Features

被引：3

作者：

Zhang, Yuxiang ^{[1
,2
]}

Li, Zhuo ^{[1
,2
]}

Lu, Jingze ^{[1
,2
]}

Wang, Wenchao ^{[1
,2
]}

Zhang, Pengyuan ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

关键词：

Feature extraction; Speech synthesis; Signal processing algorithms; Training; Robustness; Partitioning algorithms; Task analysis; Anti-spoofing; interpretability; pre-trained system; robustness; speaker verification; VERIFICATION;

D O I：

10.1109/LSP.2024.3381890

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Current synthetic speech detection (SSD) methods perform well on specific datasets but require improvement in interpretability and robustness. One possible reason is the lack of interpretability analysis of synthetic speech defects. In this paper, the flaws in the temporal consistency (TC) of speaker features inherent in the speech synthesis process are analyzed. Differences in the TC of intra-utterance speaker features arise due to limited control over speaker features during speech synthesis. The speech generated by text-to-speech algorithms exhibits higher TC, while the speech generated by voice conversion algorithms yields slightly lower TC compared to bonafide speech. Based on this finding, a new SSD method based on the TC of speaker features is proposed. Modeling the TC of intra-utterance speaker features extracted by a pre-trained ASV system can be used for SSD. The proposed method achieves equal error rates of 0.84%, 3.93%, 12.98% and 24.66% on the ASVspoof 2019 LA, 2021 LA, 2021 DF and IntheWild evaluation datasets, respectively, demonstrating strong interpretability and robustness.

引用

页码：944 / 948

页数：5

共 43 条

[31] ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection [J].

Todisco, Massimiliano ;

Wang, Xin ;

Vestman, Ville ;

Sahidullah, Md ;

Delgado, Hector ;

Nautsch, Andreas ;

Yamagishi, Junichi ;

Evans, Nicholas ;

Kinnunen, Tomi H. ;

Lee, Kong Aik .

INTERSPEECH 2019, 2019, :1008-1012

[32]

Todisco M, 2018, INTERSPEECH, P77

[33] Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification [J].

Todisco, Massimiliano ;

Delgado, Hector ;

Evans, Nicholas .

COMPUTER SPEECH AND LANGUAGE, 2017, 45 :516-535

[34]

Wang X., 2023, P IEEE INT C AC SPEE, P1

[35] A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection [J].

Wang, Xin ;

Yamagishi, Junichi .

INTERSPEECH 2021, 2021, :4259-4263

[36]

Wang X, 2020, COMPUT SPEECH LANG, V64, DOI [10.1016/j.csl.2020.101114, 10.1016/j.csi.2020.101114]

[37] The DKU-OPPO System for the 2022 Spoofing-Aware Speaker Verification Challenge [J].

Wang, Xingming ;

Qin, Xiaoyi ;

Wang, Yikang ;

Xu, Yunfei ;

Li, Ming .

INTERSPEECH 2022, 2022, :4396-4400

[38]

Wu ZZ, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P2037

[39] Spoofing and countermeasures for speaker verification: A survey [J].

Wu, Zhizheng ;

Evans, Nicholas ;

Kinnunen, Tomi ;

Yamagishi, Junichi ;

Alegre, Federico ;

Li, Haizhou .

SPEECH COMMUNICATION, 2015, 66 :130-153

[40]

Yamagishi J., 2021, P AUT SPEAK VER SPOO, P47

← 1 2 3 4 5 →