A deep learning analysis on question classification task using Word2vec representations

被引:2
作者
Seyhmus Yilmaz
Sinan Toklu
机构
[1] Düzce University,Department of Computer Engineering, Faculty of Engineering
来源
Neural Computing and Applications | 2020年 / 32卷
关键词
Deep learning; Question classification; SVM; Word embedding; Word2vec;
D O I
暂无
中图分类号
学科分类号
摘要
Question classification is a primary essential study for automatic question answering implementations. Linguistic features take a significant role to develop an accurate question classifier. Recently, deep learning systems have achieved remarkable success in various text-mining problems such as sentiment analysis, document classification, spam filtering, document summarization, and web mining. In this study, we explain our study on investigating some deep learning architectures for a question classification task in a highly inflectional language Turkish that is an agglutinative language where word structure is produced by adding suffixes (morphemes) to root word. As a non-Indo-European language, languages like Turkish have some unique features, which make it challenging for natural language processing. For instance, Turkish has no grammatical gender and noun classes. In this study, user questions in Turkish are used to train and test the deep learning architectures. In addition to this, the details of the deep learning architectures are compared in terms of test and 10-cross fold validation accuracy. We use two major deep learning models in our paper: long short-term memory (LSTM), Convolutional Neural Networks (CNN), and we also implemented the combination of CNN-LSTM, CNN-SVM structures and a number of various those architectures by changing vector sizes and the embedding types. As well as this, we have built word embeddings using the Word2vec method with a CBOW and skip gram models with different vector sizes on a large corpus composed of user questions. Our another investigation is the effect of using different Word2vec pre-trained word embeddings on these deep learning architectures. Experiment results show that the use of different Word2vec models has a significant impact on the accuracy rate on different deep learning models. Additionally, there is no Turkish question dataset labeled and so another contribution in this study is that we introduce new Turkish question dataset which is translated from UIUC English question dataset. By using these techniques, we have reached an accuracy of 94% on the question dataset.
引用
收藏
页码:2909 / 2928
页数:19
相关论文
共 44 条
  • [11] Hu F(2018)Visual exploration of semantic relationships in neural word embeddings IEEE Trans Vis Comput Graph 24 553-207
  • [12] Li L(2018)Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases BMC Med Inf Decis Mak 18 65-1958
  • [13] Zhang ZL(2019)Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification Artif Intell Med 97 79-526
  • [14] Oflazer K(2018)Using word embeddings in twitter election classification Inf Retr J 21 183-116
  • [15] Abdi A(2014)Dropout: a simple way to prevent neural networks from overfitting J Mach Learn Res 15 1929-undefined
  • [16] Ghulam H(2018)Feature extraction based on information gain and sequential pattern for English question classification IET Softw 12 520-undefined
  • [17] Heikal Maha(1998)The vanishing gradient problem during learning recurrent neural nets and problem solutions Int J Uncertain Fuzziness Knowl Syst 6 107-undefined
  • [18] Torki Marwan(undefined)undefined undefined undefined undefined-undefined
  • [19] El-Makky Nagwa(undefined)undefined undefined undefined undefined-undefined
  • [20] Athira PM(undefined)undefined undefined undefined undefined-undefined