RETRACTED: Research on the application of speech recognition in computer network technology in the era of big data (Retracted Article)

被引：0

作者：

Zhang, Baohua ^{[1
]}

机构：

[1] Changzhou Vocat Inst Engn, Gen Educ Teaching Dept, Changzhou 213164, Jiangsu, Peoples R China

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2021年 / 26卷 / 1期

关键词：

Big data; Speech recognition; Computer; Network technology; INTELLIGIBILITY;

D O I：

10.1007/s10772-021-09936-7

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

After the emergence of big data, more and more complex deep models have revealed the huge amount of information carried by millions of data. In speech recognition, the coordination and availability of training data are the main factors to improve system performance. Big data is a fuzzy image of a phenomenon, usually defined as the need for new processing models to achieve better policy design, as well as the ability to identify and optimize processes that adapt to the abundance, growth, and diversification of information products. With the emergence of the concept of deep learning, various types of neural networks have gradually begun to be applied to language models. The most common is the Linked Deep Neural Network (DNN), which is connected with the Hybrid Neural Network (CNN) and Recurrent Neural Network (RNN) to improve their models (LSTM). Now, compared with traditional methods, speech acoustics and language models based on deep neural networks have significantly improved performance and have begun to become the mainstream in practice. But the speech recognition of deep neural networks still needs to be improved. With the advent of big data, we are getting more and more data. Therefore, how to effectively use deep neural network speech recognition system is a challenge. In addition, sound and speech signals are long-term signals, which are very important for the design of acoustic models and language models. RNN or LSTM can represent long-term related success models. But training is more complicated. Therefore, it is very important to develop a complete and very effective neural model with long-term modeling functions for big data.

引用

页码：259 / 259

页数：10

共 20 条

[1] Clustering Persian viseme using phoneme subspace for developing visual speech application [J].

Aghaahmadi, Mohammad ;

Dehshibi, Mohammad Mahdi ;

Bastanfard, Azam ;

Fazlali, Mahmood .

MULTIMEDIA TOOLS AND APPLICATIONS, 2013, 65 (03) :521-541

[2]

Ahmad N, 2008, J ACOUST SOC AM, V123, P3939, DOI [10.1121/1.2936016, DOI 10.1121/1.2936016]

[3]

Alexandre D., 2010, INT J IMAGING, V4, P60

[4] Lip Feature Extraction and Reduction for HMM-Based Visual Speech Recognition Systems [J].

Alizadeh, S. ;

Boostani, R. ;

Asadpour, V. .

ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, :561-+

[5]

Aschenberner B., 2005, PHONEME VISEME MAPPI, P1

[6]

Baswaraj B. D., 2012, GLOBAL J COMPUTER SC, V12

[7]

Bear H. L., 2017, FINDING PHONEMES IMP, P115

[8] Comparing heterogeneous visual gestures for measuring the diversity of visual speech signals [J].

Bear, Helen L. ;

Harvey, Richard .

COMPUTER SPEECH AND LANGUAGE, 2018, 52 :165-190

[9]

Bear HL, 2016, INT CONF ACOUST SPEE, P2009, DOI 10.1109/ICASSP.2016.7472029

[10] VISUAL INTELLIGIBILITY OF CONSONANTS - LIPREADING SCREENING-TEST WITH IMPLICATIONS FOR AURAL REHABILITATION [J].

BINNIE, CA ;

JACKSON, PL ;

MONTGOMERY, AA .

JOURNAL OF SPEECH AND HEARING DISORDERS, 1976, 41 (04) :530-539

← 1 2 →