A high-performance speech BioHashing retrieval algorithm based on audio segmentation

被引:1
|
作者
Huang, Yi-Bo [1 ]
Chen, De-Huai [1 ]
Hua, Bo-Run [1 ]
Zhang, Qiu-Yu [2 ]
机构
[1] Northwest Normal Univ, Coll Phys & Elect Engn, Lanzhou, Peoples R China
[2] Lanzhou Univ Technol, Sch Comp & Commun, Lanzhou, Peoples R China
来源
COMPUTER SPEECH AND LANGUAGE | 2023年 / 83卷
基金
中国国家自然科学基金;
关键词
High-performance speech retrieval; Biometric template; BioHashing; Audio segmentation; Hash reconstruction; QUANTIZATION;
D O I
10.1016/j.csl.2023.101551
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As one of the research hotspots in the field of speech recognition, content-based speech retrieval algorithms can detect speech information with the same content features, which improves computer intelligence while reducing labor costs, and thus have been widely used. Although most of the current speech content retrieval algorithms can guarantee excellent retrieval performance for small-scale speech retrieval work, the performance of the above algorithms is greatly reduced under the constraints of large speech data storage space and high content redundancy. In order to solve the above problems, a high-performance speech BioHashing retrieval algorithm based on audio segmentation is proposed in this paper. The algorithm is divided into an offline preprocessing phase and an online retrieval phase, The offline pre-processing stage converts the speech data into BioHashing sequences with speech content characteristics. In this process, first of all, the Power-Normalized Cepstral Coefficients (PNCC) features of the speech data are extracted and biometric templates with single mapping keys are constructed according to the PNCC features, obtaining BioHashing sequences. Then, slice the original speeches into short-time audio segments according to the proposed audio segmentation algorithm, and the hash reconstruction operation is performed on the BioHashing sequences to obtain the reconstructed Hashing sequences for online retrieval. The online search phase responds to the users' query requests, just find the hash index that matches the query hash sequence from the BioHashing index table, and will the standardized editing distance (SED) to the closest 1 value corresponding to the hash index as the retrieval result back to the user. The experimental results show that the reconstructed hash sequences obtained after removing the silent redundant segments have better robustness and discrimination. Moreover, the algorithm achieves 100% retrieval accuracy for the original speech clips, and the average retrieval time is only 0.0157 s, which shows that the algorithm has good retrieval performance and can meet the needs of speech retrieval in various environments.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] A high-performance speech neuroprosthesis
    Francis R. Willett
    Erin M. Kunz
    Chaofei Fan
    Donald T. Avansino
    Guy H. Wilson
    Eun Young Choi
    Foram Kamdar
    Matthew F. Glasser
    Leigh R. Hochberg
    Shaul Druckmann
    Krishna V. Shenoy
    Jaimie M. Henderson
    Nature, 2023, 620 : 1031 - 1036
  • [22] Neural Network-Based Dynamic Segmentation and Weighted Integrated Matching of Cross-Media Piano Performance Audio Recognition and Retrieval Algorithm
    Wang, Tianshu
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [23] Audio-visual speech recognition using deep bottleneck features and high-performance lipreading
    Tamura, Satoshi
    Ninomiya, Hiroshi
    Kitaoka, Norihide
    Osuga, Shin
    Iribe, Yurie
    Takeda, Kazuya
    Hayamizu, Satoru
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 575 - 582
  • [24] A novel classification-based audio segmentation algorithm
    Department of Automation, Tsinghua University, Beijing 100084, China
    不详
    Tien Tzu Hsueh Pao, 2006, 4 (612-617):
  • [25] A high-performance speech perceptual hashing authentication algorithm based on discrete wavelet transform and measurement matrix
    Zhang, Qiu-yu
    Qiao, Si-bin
    Huang, Yi-bo
    Zhang, Tao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (16) : 21653 - 21669
  • [26] A high-performance speech perceptual hashing authentication algorithm based on discrete wavelet transform and measurement matrix
    Qiu-yu Zhang
    Si-bin Qiao
    Yi-bo Huang
    Tao Zhang
    Multimedia Tools and Applications, 2018, 77 : 21653 - 21669
  • [27] Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval
    Turunen V.T.
    Kurimo M.
    ACM Transactions on Speech and Language Processing, 2011, 8 (01):
  • [28] Improved speaker based speech segmentation algorithm
    Lu, Jian
    Mao, Bing
    Sun, Zheng-Xing
    Zhang, Fu-Yan
    Ruan Jian Xue Bao/Journal of Software, 2002, 13 (02): : 274 - 279
  • [29] HIGH-PERFORMANCE DIGITAL AUDIO RECORDER.
    McCracken, John A.
    1978, 26 (7-8):
  • [30] Audio fingerprint retrieval algorithm using anti-fingerprint and frequency domain segmentation
    CHEN Shuli
    ZHANG Xueshuai
    ZHANG Pengyuan
    LIU Jian
    Chinese Journal of Acoustics, 2023, 42 (01) : 82 - 97