Using word embeddings in Twitter election classification

被引:6
作者
Xiao Yang
Craig Macdonald
Iadh Ounis
机构
[1] University of Glasgow,School of Computing Science
来源
Information Retrieval Journal | 2018年 / 21卷
关键词
Word embedding; CNN; Twitter; Election classification; Word2vec;
D O I
暂无
中图分类号
学科分类号
摘要
Word embeddings and convolutional neural networks (CNN) have attracted extensive attention in various classification tasks for Twitter, e.g. sentiment classification. However, the effect of the configuration used to generate the word embeddings on the classification performance has not been studied in the existing literature. In this paper, using a Twitter election classification task that aims to detect election-related tweets, we investigate the impact of the background dataset used to train the embedding models, as well as the parameters of the word embedding training process, namely the context window size, the dimensionality and the number of negative samples, on the attained classification performance. By comparing the classification results of word embedding models that have been trained using different background corpora (e.g. Wikipedia articles and Twitter microposts), we show that the background data should align with the Twitter classification dataset both in data type and time period to achieve significantly better performance compared to baselines such as SVM with TF-IDF. Moreover, by evaluating the results of word embedding models trained using various context window sizes and dimensionalities, we find that large context window and dimension sizes are preferable to improve the performance. However, the number of negative samples parameter does not significantly affect the performance of the CNN classifiers. Our experimental results also show that choosing the correct word embedding model for use with CNN leads to statistically significant improvements over various baselines such as random, SVM with TF-IDF and SVM with word embeddings. Finally, for out-of-vocabulary (OOV) words that are not available in the learned word embedding models, we show that a simple OOV strategy to randomly initialise the OOV words without any prior knowledge is sufficient to attain a good classification performance among the current OOV strategies (e.g. a random initialisation using statistics of the pre-trained word embedding models).
引用
收藏
页码:183 / 207
页数:24
相关论文
共 54 条
[1]  
Bansal M(2014)Tailoring continuous word representations for dependency parsing Proceedings of ACL 2 809-815
[2]  
Gimpel K(2003)A neural probabilistic language model Journal of Machine Learning Research 3 1137-1155
[3]  
Livescu K(1968)Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit Psychological Bulletin 70 213-2537
[4]  
Bengio Y(2011)Natural language processing (almost) from scratch Journal of Machine Learning Research 12 2493-1923
[5]  
Ducharme R(1998)Approximate statistical tests for comparing supervised classification learning algorithms Neural Computation 10 1895-951
[6]  
Vincent P(2000)Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit Nature 405 947-2830
[7]  
Janvin C(2011)Scikit-learn: Machine learning in Python Journal of Machine Learning Research 12 2825-1958
[8]  
Cohen J(2014)Dropout: A simple way to prevent neural networks from overfitting Journal of Machine Learning Research 15 1929-1565
[9]  
Collobert R(2014)Learning sentiment-specific word embedding for Twitter sentiment classification Proceedings of ACL 1 1555-357
[10]  
Weston J(2015)Semantic clustering and convolutional neural network for short text categorization Proceedings of ACL-IJCNLP 2 352-undefined