A high-performance speech BioHashing retrieval algorithm based on audio segmentation

被引：1

作者：

Huang, Yi-Bo ^{[1
]}

Chen, De-Huai ^{[1
]}

Hua, Bo-Run ^{[1
]}

Zhang, Qiu-Yu ^{[2
]}

机构：

[1] Northwest Normal Univ, Coll Phys & Elect Engn, Lanzhou, Peoples R China

[2] Lanzhou Univ Technol, Sch Comp & Commun, Lanzhou, Peoples R China

来源：

COMPUTER SPEECH AND LANGUAGE | 2023年 / 83卷

基金：

中国国家自然科学基金;

关键词：

High-performance speech retrieval; Biometric template; BioHashing; Audio segmentation; Hash reconstruction; QUANTIZATION;

D O I：

10.1016/j.csl.2023.101551

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As one of the research hotspots in the field of speech recognition, content-based speech retrieval algorithms can detect speech information with the same content features, which improves computer intelligence while reducing labor costs, and thus have been widely used. Although most of the current speech content retrieval algorithms can guarantee excellent retrieval performance for small-scale speech retrieval work, the performance of the above algorithms is greatly reduced under the constraints of large speech data storage space and high content redundancy. In order to solve the above problems, a high-performance speech BioHashing retrieval algorithm based on audio segmentation is proposed in this paper. The algorithm is divided into an offline preprocessing phase and an online retrieval phase, The offline pre-processing stage converts the speech data into BioHashing sequences with speech content characteristics. In this process, first of all, the Power-Normalized Cepstral Coefficients (PNCC) features of the speech data are extracted and biometric templates with single mapping keys are constructed according to the PNCC features, obtaining BioHashing sequences. Then, slice the original speeches into short-time audio segments according to the proposed audio segmentation algorithm, and the hash reconstruction operation is performed on the BioHashing sequences to obtain the reconstructed Hashing sequences for online retrieval. The online search phase responds to the users' query requests, just find the hash index that matches the query hash sequence from the BioHashing index table, and will the standardized editing distance (SED) to the closest 1 value corresponding to the hash index as the retrieval result back to the user. The experimental results show that the reconstructed hash sequences obtained after removing the silent redundant segments have better robustness and discrimination. Moreover, the algorithm achieves 100% retrieval accuracy for the original speech clips, and the average retrieval time is only 0.0157 s, which shows that the algorithm has good retrieval performance and can meet the needs of speech retrieval in various environments.

引用

页数：15

共 50 条

[21] A high-performance speech neuroprosthesis
Francis R. Willett
Erin M. Kunz
Chaofei Fan
Donald T. Avansino
Guy H. Wilson
Eun Young Choi
Foram Kamdar
Matthew F. Glasser
Leigh R. Hochberg
Shaul Druckmann
Krishna V. Shenoy
Jaimie M. Henderson
Nature, 2023, 620 : 1031 - 1036
[22] Neural Network-Based Dynamic Segmentation and Weighted Integrated Matching of Cross-Media Piano Performance Audio Recognition and Retrieval Algorithm
Wang, Tianshu
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[23] Audio-visual speech recognition using deep bottleneck features and high-performance lipreading
Tamura, Satoshi
Ninomiya, Hiroshi
Kitaoka, Norihide
Osuga, Shin
Iribe, Yurie
Takeda, Kazuya
Hayamizu, Satoru
2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 575 - 582
[24] A novel classification-based audio segmentation algorithm
Department of Automation, Tsinghua University, Beijing 100084, China
不详
Tien Tzu Hsueh Pao, 2006, 4 (612-617):
[25] A high-performance speech perceptual hashing authentication algorithm based on discrete wavelet transform and measurement matrix
Zhang, Qiu-yu
Qiao, Si-bin
Huang, Yi-bo
Zhang, Tao
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (16) : 21653 - 21669
[26] A high-performance speech perceptual hashing authentication algorithm based on discrete wavelet transform and measurement matrix
Qiu-yu Zhang
Si-bin Qiao
Yi-bo Huang
Tao Zhang
Multimedia Tools and Applications, 2018, 77 : 21653 - 21669
[27] Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval
Turunen V.T.
Kurimo M.
ACM Transactions on Speech and Language Processing, 2011, 8 (01):
[28] Improved speaker based speech segmentation algorithm
Lu, Jian
Mao, Bing
Sun, Zheng-Xing
Zhang, Fu-Yan
Ruan Jian Xue Bao/Journal of Software, 2002, 13 (02): : 274 - 279
[29] HIGH-PERFORMANCE DIGITAL AUDIO RECORDER.
McCracken, John A.
1978, 26 (7-8):
[30] Audio fingerprint retrieval algorithm using anti-fingerprint and frequency domain segmentation
CHEN Shuli
ZHANG Xueshuai
ZHANG Pengyuan
LIU Jian
Chinese Journal of Acoustics, 2023, 42 (01) : 82 - 97

← 1 2 3 4 5 →