Word2vec convolutional neural networks for classification of news articles and tweets

被引:93
作者
Jang, Beakcheol [1 ]
Kim, Inhwan [1 ]
Kim, Jong Wook [1 ]
机构
[1] Sangmyung Univ, Dept Comp Sci, Seoul, South Korea
来源
PLOS ONE | 2019年 / 14卷 / 08期
关键词
D O I
10.1371/journal.pone.0220976
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Big web data from sources including online news and Twitter are good resources for investigating deep learning. However, collected news articles and tweets almost certainly contain data unnecessary for learning, and this disturbs accurate learning. This paper explores the performance of word2vec Convolutional Neural Networks (CNNs) to classify news articles and tweets into related and unrelated ones. Using two word embedding algorithms of word2vec, Continuous Bag-of-Word (CBOW) and Skip-gram, we constructed CNN with the CBOW model and CNN with the Skip-gram model. We measured the classification accuracy of CNN with CBOW, CNN with Skip-gram, and CNN without word2vec models for real news articles and tweets. The experimental results indicated that word2vec significantly improved the accuracy of the classification model. The accuracy of the CBOW model was higher and more stable when compared to that of the Skip-gram model. The CBOW model exhibited better performance on news articles, and the Skip-gram model exhibited better performance on tweets. Specifically, CNN with word2vec models was more effective on news articles when compared to that on tweets because news articles are typically more uniform when compared to tweets.
引用
收藏
页数:20
相关论文
共 62 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] [Anonymous], 2007, ISSM
  • [3] [Anonymous], 2017, ARXIV170400177
  • [4] [Anonymous], 2014, 52 ANN M ASS COMP LI
  • [5] [Anonymous], 2015, Nature, DOI [10.1038/nature14539, DOI 10.1038/NATURE14539]
  • [6] [Anonymous], DEEPLEARNING4J OPEN
  • [7] [Anonymous], WORD EMBEDDINGS GO I
  • [8] [Anonymous], 2016, KONLP KOREAN NLP PAC
  • [9] [Anonymous], ADV NEURAL INFORM PR
  • [10] [Anonymous], 2010, PYTHON SCI COMPUTING