A deep learning analysis on question classification task using Word2vec representations

被引:2
作者
Seyhmus Yilmaz
Sinan Toklu
机构
[1] Düzce University,Department of Computer Engineering, Faculty of Engineering
来源
Neural Computing and Applications | 2020年 / 32卷
关键词
Deep learning; Question classification; SVM; Word embedding; Word2vec;
D O I
暂无
中图分类号
学科分类号
摘要
Question classification is a primary essential study for automatic question answering implementations. Linguistic features take a significant role to develop an accurate question classifier. Recently, deep learning systems have achieved remarkable success in various text-mining problems such as sentiment analysis, document classification, spam filtering, document summarization, and web mining. In this study, we explain our study on investigating some deep learning architectures for a question classification task in a highly inflectional language Turkish that is an agglutinative language where word structure is produced by adding suffixes (morphemes) to root word. As a non-Indo-European language, languages like Turkish have some unique features, which make it challenging for natural language processing. For instance, Turkish has no grammatical gender and noun classes. In this study, user questions in Turkish are used to train and test the deep learning architectures. In addition to this, the details of the deep learning architectures are compared in terms of test and 10-cross fold validation accuracy. We use two major deep learning models in our paper: long short-term memory (LSTM), Convolutional Neural Networks (CNN), and we also implemented the combination of CNN-LSTM, CNN-SVM structures and a number of various those architectures by changing vector sizes and the embedding types. As well as this, we have built word embeddings using the Word2vec method with a CBOW and skip gram models with different vector sizes on a large corpus composed of user questions. Our another investigation is the effect of using different Word2vec pre-trained word embeddings on these deep learning architectures. Experiment results show that the use of different Word2vec models has a significant impact on the accuracy rate on different deep learning models. Additionally, there is no Turkish question dataset labeled and so another contribution in this study is that we introduce new Turkish question dataset which is translated from UIUC English question dataset. By using these techniques, we have reached an accuracy of 94% on the question dataset.
引用
收藏
页码:2909 / 2928
页数:19
相关论文
共 44 条
  • [1] Silva J(2011)From symbolic to sub-symbolic information in question classification Artif Intell Rev 35 137-154
  • [2] Luísa C(2013)Question classification using semantic, syntactic and lexical features Int J Web Semant Technol 4 39-27
  • [3] Mendes AC(2018)Question classification in Persian using word vectors and frequencies Cogn Syst Res 47 16-795
  • [4] Andreas W(2017)Emphasizing essential words for sentiment classification based on recurrent neural networks J Comput Sci Technol 32 785-653
  • [5] Mishra M(2014)Turkish and its challenges for language processing Lang Resour Eval 48 639-1259
  • [6] Mishra VK(2019)Deep learning-based sentiment classification of evaluative text based on Multi-feature fusion Inf Process Manag 56 1245-135
  • [7] Sharma HR(2019)Deep learning-based sentiment analysis for Roman Urdu text Procedia Comput Sci 147 131-122
  • [8] Razzaghnoori M(2018)Sentiment analysis of Arabic Tweets using deep learning Procedia Comput Sci 142 114-1943
  • [9] Sajedi H(2013)Architecture of an ontology-based domain-specific natural language question answering system Int J Web Semant Technol 4 31-562
  • [10] Jazani IK(2010)A semantic approach for question classification using WordNet and Wikipedia Pattern Recognit Lett 31 1935-88