Cognitive technology in task-oriented dialogue systems: concepts, advances and future

被引:0
作者
Yu, Kai [1 ,2 ]
Chen, Lu [1 ,2 ]
Chen, Bo [1 ,2 ]
Sun, Kai [1 ,2 ]
Zhu, Su [1 ,2 ]
机构
[1] Speech Lab, Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai
[2] Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2015年 / 38卷 / 12期
关键词
Cognitive control; Cognitive technology; Dialogue systems; Human computer interaction; Human-machine interface;
D O I
10.11897/SP.J.1016.2015.02333
中图分类号
学科分类号
摘要
Human-machine dialogue system is a human-machine interaction system which treats the machine as a cognitive agent. With the advances of computing hardware and software as well as the booming of mobile internet, cognitive dialogue system, which can deal with uncertain interactive information, attracts great interest. The paper argues that task-oriented dialogue system consists of three layers: physical layer, control layer and application layer. IO technology, cognitive technology and knowledge management are corresponding techniques. Cognitive technology is a new middle ware technology recently emerging with the need of instant natural human-machine conversation. Its goal is to make machine a cognitive agent capable of understanding, learning, guiding and adapting. For this purpose, deep and robust understanding, inference based on uncertain information, policy optimization, adaptation and influential information generation are required. This paper is a position paper of cognitive technology. The scope and content of cognitive technology in dialogue systems is introduced. Relevant techniques are reviewed and future research direction are also discussed. © 2015, Science Press. All right reserved.
引用
收藏
页码:2333 / 2348
页数:15
相关论文
共 91 条
[1]  
Dong S.-H., Wang H., Human-Computer Interaction, (2003)
[2]  
Dahland G.E., Yu D., Deng L., Acero A., Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Transactions on Audio, Speech & Language Processing, 20, 1, pp. 30-42, (2012)
[3]  
Federico M., Bertoldi N., Cettolo M., Irstlm: An open source toolkit for handling large scale language models, Proceedings of the Annual Conference of the International Speech Communication Association (InterSpeech), pp. 1618-1621, (2008)
[4]  
Mohri M., Pereira F., Riley M., Weighted finite-state transducers in speech recognition, Computer Speech & Language, 16, 1, pp. 69-88, (2002)
[5]  
Senior A., Lei X., Fine context, low-rank, softplus deep neural networks for mobile speech recognition, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (2014)
[6]  
Zen H.-G., Tokuda K., Black A.W., Statistical parametric speech synthesis, Speech Communication, 51, 11, pp. 1039-1064, (2009)
[7]  
Wu Y.J., Wang R.H., Minimum generation error training for hmm-based speech synthesis, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (2006)
[8]  
Yu K., Young S., Continuous F0 modelling for HMM based statistical speech synthesis, IEEE Transactions on Audio, Speech and Language Processing, 19, 5, pp. 1071-1079, (2011)
[9]  
Zen H., Senior A., Schuster M., Statistical parametric speech synthesis using deep neural networks, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (2013)
[10]  
Ernst M.O., Banks M.S., Humans integrate visual and haptic information in a statistically optimal fashion, Nature, 415, 6870, pp. 429-433, (2002)