Automatic Speech Recognition: An Improved Paradigm

被引:0
作者
Topoleanu, Tudor-Sabin
Mogan, Gheorghe Leonte
机构
来源
TECHNOLOGICAL INNOVATION FOR SUSTAINABILITY | 2011年 / 349卷
关键词
automatic speech recognition; natural language processing; probabilistic language acquisition; unsupervised learning of speech; MACHINE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a short survey of automatic speech recognition systems underlining the current achievements and capabilities of current day solutions as well as their inherent limitations and shortcomings. In response to which we propose an improved paradigm and algorithm for building an automatic speech recognition system that actively adapts its recognition model in an unsupervised fashion by listening to continuous human speech. The paradigm relies on creating a semi-autonomous system that samples continuous human speech in order to record phonetic units. Then processes those phoneme sized samples to identify the degree of similarity of each sample that will allow the detection of the same phoneme across many samples. After a sufficiently large database of samples has been gathered the system clusters the samples based on their degree of similarity, creating a different cluster for each phoneme. After that the system trains one neural network for each cluster using the samples in that cluster. After a few iterations of sampling, processing, clustering and training the system should contain a neural network detector for each phoneme unit of the spoken language that the system has been exposed to, and be able to use these detectors to recognize phonemes from live speech. Finally we provide the structure and algorithms for this novel automatic speech recognition paradigm.
引用
收藏
页码:269 / +
页数:3
相关论文
共 20 条
[1]   Language Acquisition Meets Language Evolution [J].
Chater, Nick ;
Christiansen, Morten H. .
COGNITIVE SCIENCE, 2010, 34 (07) :1131-1157
[2]   Improved voice activity detection algorithm using wavelet and support vector machine [J].
Chen, Shi-Huang ;
Guido, Rodrigo Capobianco ;
Truong, Trieu-Kien ;
Chang, Yaotsu .
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (03) :531-543
[3]   Joint acoustic and language modeling for speech recognition [J].
Chien, Jen-Tzung ;
Chueh, Chuang-Hua .
SPEECH COMMUNICATION, 2010, 52 (03) :223-235
[4]   Speech recognition with artificial neural networks [J].
Dede, Guelin ;
Sazli, Murat Huesnue .
DIGITAL SIGNAL PROCESSING, 2010, 20 (03) :763-768
[5]  
Dixon P.R., 2010, COMPUTER SPEECH LANG, V243, P510
[6]   Voice activity detection based on using wavelet packet [J].
Eshaghi, Mohadese ;
Mollaei, M. R. Karami .
DIGITAL SIGNAL PROCESSING, 2010, 20 (04) :1102-1115
[7]   Discriminative classifiers with adaptive kernels for noise robust speech recognition [J].
Gales, M. J. F. ;
Flego, F. .
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (04) :648-662
[8]   The Logical Problem of Language Acquisition: A Probabilistic Perspective [J].
Hsu, Anne S. ;
Chater, Nick .
COGNITIVE SCIENCE, 2010, 34 (06) :972-1016
[9]   Point process models for event-based speech recognition [J].
Jansen, Aren ;
Niyogi, Partha .
SPEECH COMMUNICATION, 2009, 51 (12) :1155-1168
[10]   Discriminative training of HMMs for automatic speech recognition: A survey [J].
Jiang, Hui .
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (04) :589-608