Identification of Reconstructed Speech

被引:2
作者
Wu, Haojun [1 ,2 ,4 ]
Wang, Yong [3 ,5 ]
Huang, Jiwu [1 ,4 ]
机构
[1] Shenzhen Univ, Shenzhen, Peoples R China
[2] Sun Yat Sen Univ, Guangzhou, Peoples R China
[3] Guangdong Polytechn Normal Univ, Guangzhou, Peoples R China
[4] Shenzhen Univ, Coll Informat Engn, Shenzhen Key Lab Media Secur, Nanhai Ave 3688, Shenzhen, Peoples R China
[5] Guangdong Polytech Normal Univ, Sch Elect & Informat, West Zhongshan Ave 293, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Audio forensics; reconstructed speech; identification; speaker verification; MFCC; GMM supervectors; LDA-ensemble classification; SUPPORT VECTOR MACHINES; SPEAKER; MODEL; ADAPTATION; ALGORITHMS; PHASE;
D O I
10.1145/3004055
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Both voice conversion and hidden Markey model (HAIM) based speech synthesis can be used to produce artificial voices of a target speaker. They have shown great negative impacts on speaker verification (SV) systems. In order to enhance the security of SV systems, the techniques to detect converted/synthesized speech should be taken into consideration. During voice conversion and IIMM-based synthesis, speech reconstruction is applied to transform a set of acoustic parameters to reconstructed speech. Hence, the identification of reconstructed speech can be used to distinguish converted/synthesized speech from human speech. Several related works on such identification have been reported. The equal error rates (EERs) lower than 5% of detecting' reconstructed speech have been achieved. However, through the cross-database evaluations on different speech databases, we find that the EERs of several testing cases are higher than 10%. The robustness of detection algorithms to different speech databases needs to be improved. In this article, we propose an algorithm to identify the reconstructed speech. Three different speech databases and two different reconstruction methods are considered in our work, which has not been addressed in the reported works. The high-dimensional data visualization approach is used to analyze the effect of speech reconstruction on Mel-frequency cepstral coefficients (MFCC) of speech signals. The Gaussian mixture model supervectors of MFCC are used as acoustic features. Furthermore, a set of commonly used classification algorithms are applied to identify reconstructed speech. According to the comparison among different classification methods, linear discriminant analysis-ensemble classifiers are chosen in our algorithm. Extensive experimental results show that the EERs lower than 1% can be achieved by the proposed algorithm in most cases, outperforming the reported state-of-the-art identification techniques.
引用
收藏
页数:20
相关论文
共 46 条
  • [1] Short-time phase spectrum in speech processing: A review and some experimental results
    Alsteris, Leigh D.
    Paliwal, Kuldip K.
    [J]. DIGITAL SIGNAL PROCESSING, 2007, 17 (03) : 578 - 616
  • [2] [Anonymous], INT J RES REV APPL S
  • [3] [Anonymous], 1999, P EUROSPEECH
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] Campbell William M., 2003, P NEUR INF PROC SYST
  • [6] Support vector machines for speaker and language recognition
    Campbell, WM
    Campbell, JP
    Reynolds, DA
    Singer, E
    Torres-Carrasquillo, PA
    [J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) : 210 - 229
  • [7] Support vector machines using GMM supervectors for speaker verification
    Campbell, WM
    Sturim, DE
    Reynolds, DA
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) : 308 - 311
  • [8] Chang Chih-Chung, 2014, LIBLINEAR LIB LARGE
  • [9] Chang Chih-Chung., 2014, LIBSVM: a library for support vector machines
  • [10] Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech
    De Leon, Phillip L.
    Pucher, Michael
    Yamagishi, Junichi
    Hernaez, Inma
    Saratxaga, Ibon
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (08): : 2280 - 2290