AN ADAPTIVE VOICE ACTIVITY DETECTION ALGORITHM

被引:0
作者
Zhang Zhigang [1 ]
Huang Junqin [2 ]
机构
[1] Xian Univ Technol, Sch Printing & Packaging Engn, Xian, Peoples R China
[2] Xian Univ Technol, Engn Training Ctr, Xian, Peoples R China
来源
INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS | 2015年 / 8卷 / 04期
关键词
Voice signal; Endpoint detection; Short-time amplitude; Multi-scale detection; Adaptive threshold;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Voice Activity Detection (VAD) is a crucial step for speech processing, which detecting accuracy and speed directly affects the effect of subsequent processing. Some voice processing system based phone or in the indoor environment, which need simple and quick method of VAD, for these representative voice signal, this paper proposes a new algorithm which is adaptive and quick based on a major improvement to Dual-Threshold endpoint detection algorithm. First the amplitude normalization is processed to the original voice signal, the characteristic is extracted by means of short-time amplitude, which can simplify operation. Then, large-scale (long frame-length and frame-shift) short-time amplitude is used for rough detection, combining adaptive threshold judgement of consecutive frames, which can find voice areas of start-point and end-point quickly. To these areas, small-scale (short frame-length and frame-shift) short-time amplitude is used for accurate detection, forward scanning is put to start-point area, reverse scanning is put to end-point area, combining adaptive threshold judgement of consecutive frames, start-point and end-point of the effective speech can be accurately located. Experimental results show that the method of this paper can detect endpoints of voice signal more quickly and accurately, which can improve recognition performance dramatically. Large-scale can increase detection speed, small-scale can improve detection accuracy, both can be adjusted to satisfy the different requirements. The method of this paper ensures both detection speed and precision, which has more flexibility and applicability.
引用
收藏
页码:2175 / 2194
页数:20
相关论文
共 25 条
  • [1] Artificial neural network based autoregressive modeling technique with application in voice activity detection
    Aibinu, A. M.
    Salami, M. J. E.
    Shafie, A. A.
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2012, 25 (06) : 1265 - 1276
  • [2] Permutation entropy: A natural complexity measure for time series
    Bandt, C
    Pompe, B
    [J]. PHYSICAL REVIEW LETTERS, 2002, 88 (17) : 4
  • [3] Chao Hao, 2014, Application Research of Computers, V31, P3365, DOI 10.3969/j.issn.1001-3695.2014.11.038
  • [4] Fan Ying-le, 2006, Chinese Journal of Sensors and Actuators, V19, P750
  • [5] Hu Guang-Rui, 2000, Acta Electronica Sinica, V28, P95
  • [6] Huan Zhao, 2009, Proceedings of the 2009 Fifth International Joint Conference on INC, IMS and IDC, P1364, DOI 10.1109/NCM.2009.134
  • [7] Huang LS, 2000, INT CONF ACOUST SPEE, P1751, DOI 10.1109/ICASSP.2000.862091
  • [8] Enhanced voice activity detection in kernel subspace domain
    Kim, Dong Kook
    Shin, Jong Won
    Chang, Joon-Hyuk
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (01) : EL70 - EL76
  • [9] Statistical voice activity detection in kernel space
    Kim, Dong Kook
    Chang, Joon-Hyuk
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (04) : EL303 - EL309
  • [10] Kun-Ching Wang, 2011, International Journal of Computers & Applications, V33, P220, DOI 10.2316/Journal.202.2011.3.202-2979