Learning Word Embeddings with Chi-Square Weights for Healthcare Tweet Classification

被引:15
|
作者
Kuang, Sicong [1 ]
Davison, Brian D. [1 ]
机构
[1] Lehigh Univ, Dept Comp Sci & Engn, 19 Mem Dr West, Bethlehem, PA 18015 USA
来源
APPLIED SCIENCES-BASEL | 2017年 / 7卷 / 08期
关键词
word embedding; healthcare; classification;
D O I
10.3390/app7080846
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Twitter is a popular source for the monitoring of healthcare information and public disease. However, there exists much noise in the tweets. Even though appropriate keywords appear in the tweets, they do not guarantee the identification of a truly health-related tweet. Thus, the traditional keyword-based classification task is largely ineffective. Algorithms for word embeddings have proved to be useful in many natural language processing (NLP) tasks. We introduce two algorithms based on an existing word embedding learning algorithm: the continuous bag-of-words model (CBOW). We apply the proposed algorithms to the task of recognizing healthcare-related tweets. In the CBOW model, the vector representation of words is learned from their contexts. To simplify the computation, the context is represented by an average of all words inside the context window. However, not all words in the context window contribute equally to the prediction of the target word. Greedily incorporating all the words in the context window will largely limit the contribution of the useful semantic words and bring noisy or irrelevant words into the learning process, while existing word embedding algorithms also try to learn a weighted CBOW model. Their weights are based on existing pre-defined syntactic rules while ignoring the task of the learned embedding. We propose learning weights based on the words' relative importance in the classification task. Our intuition is that such learned weights place more emphasis on words that have comparatively more to contribute to the later task. We evaluate the embeddings learned from our algorithms on two healthcare-related datasets. The experimental results demonstrate that embeddings learned from the proposed algorithms outperform existing techniques by a relative accuracy improvement of over 9%.
引用
收藏
页数:12
相关论文
共 8 条
  • [1] Application of Chi-square discretization algorithms to ensemble classification methods
    Peker N.
    Kubat C.
    Expert Systems with Applications, 2021, 185
  • [2] A comparative analysis of machine learning algorithms for waste classification: inceptionv3 and chi-square features
    E. T. Yasin
    M. Koklu
    International Journal of Environmental Science and Technology, 2025, 22 (10) : 9415 - 9428
  • [3] Classification of Categorical Data Based on the Chi-Square Dissimilarity and t-SNE
    Cardona, Luis Ariosto Serna
    Vargas-Cardona, Hernan Dario
    Navarro Gonzalez, Piedad
    Cardenas Pena, David Augusto
    Orozco Gutierrez, Alvaro Angel
    COMPUTATION, 2020, 8 (04) : 1 - 15
  • [4] A fuzzy rough granular ensemble learning based on the feature selection with chi-square
    Hou, Xianyu
    Chen, Yumin
    Wu, Keshou
    Zhou, Ying
    Lu, Junwen
    Weng, Xuan
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (03) : 6201 - 6217
  • [5] Determination of Priority Parameter for Classification of Poverty using Chi-Square method and Crammer's V Correlation
    Iskandar, Derick
    Suprapto, Yoyon K.
    Purnama, I. Ketut Eddy
    2016 1ST INTERNATIONAL SEMINAR ON APPLICATION FOR TECHNOLOGY OF INFORMATION AND COMMUNICATION (ISEMANTIC): SCIENCE AND TECHNOLOGY FOR A BETTER FUTURE, 2016, : 247 - 252
  • [6] Smart Cities-Based Improving Atmospheric Particulate Matters Prediction Using Chi-Square Feature Selection Methods by Employing Machine Learning Techniques
    Mengash, Hanan Abdullah
    Hussain, Lal
    Mahgoub, Hany
    Al-Qarafi, A.
    Nour, Mohamed K.
    Marzouk, Radwa
    Qureshi, Shahzad Ahmad
    Hilal, Anwer Mustafa
    APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
  • [7] Machine learning for financial transaction classification across companies using character-level word embeddings of text fields
    Jorgensen, Rasmus Kaer
    Igel, Christian
    INTELLIGENT SYSTEMS IN ACCOUNTING FINANCE & MANAGEMENT, 2021, 28 (03) : 159 - 172
  • [8] A Novel Study: GAN-Based Minority Class Balancing and Machine-Learning-Based Network Intruder Detection Using Chi-Square Feature Selection
    Alabrah, Amerah
    APPLIED SCIENCES-BASEL, 2022, 12 (22):