Word embedding empowered topic recognition in news articles

被引:0
|
作者
Kaleem, Sidrah [1 ]
Jalil, Zakia [2 ]
Nasir, Muhammad [3 ]
Alazab, Moutaz [4 ,5 ]
机构
[1] Int Islamic Univ, Dept Comp Sci, Islamabad, Pakistan
[2] Int Islamic Univ, Dept Data Sci & Artificial Intelligence, Islamabad, Pakistan
[3] Int Islamic Univ, Dept Software Engn, Islamabad, Pakistan
[4] Al Balqa Appl Univ, Fac Artificial Intelligence, Dept Intelligent Syst, Al Salt, Jordan
[5] Liverpool John Moores Univ, Oryx Universal Coll, Sch Comp & Data Sci, Doha, Qatar
关键词
Artificial intelligence; Computer vision; Neural networks; Natural language processing; Word embedding; Topic modeling; MODEL;
D O I
10.7717/peerj-cs.2300
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Advancements in technology have placed global news at our fingertips, anytime, anywhere, through social media and online news sources. Analyzing the extensive electronic text collections is urgently needed. According to the scholars, combining the topic and word embedding models could improve text representation and help with downstream tasks related to natural language processing. However, the field of news topic recognition lacks a standardized approach to integrating topic models and word embedding models. This presents an exciting opportunity for research, as existing algorithms tend to be overly complex and miss out on the potential benefits of fusion. To overcome limitations in news text topic recognition, this research suggests a new technique word embedding latent Dirichlet allocation that combines topic models and word embeddings for better news topic recognition. This framework seamlessly integrates probabilistic topic modeling using latent Dirichlet allocation with Gibbs sampling, semantic insights from Word2Vec embeddings, and syntactic relationships to extract comprehensive text representations. Popular classifiers leverage these representations to perform automatic and precise news topic identification. Consequently, our framework seamlessly integrates document-topic relationships and contextual information, enabling superior performance, enhanced expressiveness, and efficient dimensionality reduction. Our word embedding method significantly outperforms existing approaches, reaching 88% and 97% accuracy on 20NewsGroup and BBC News in news topic recognition.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Comparative analysis with topic modeling and word embedding methods after the Aegean Sea earthquake on Twitter
    Nazmiye Eligüzel
    Cihan Çetinkaya
    Türkay Dereli
    Evolving Systems, 2023, 14 : 245 - 261
  • [32] Measuring Similarity of Academic Articles with Semantic Profile and Joint Word Embedding
    Ming Liu
    Bo Lang
    Zepeng Gu
    Ahmed Zeeshan
    TsinghuaScienceandTechnology, 2017, 22 (06) : 619 - 632
  • [33] Word embedding and classification methods and their effects on fake news detection
    Hauschild, Jessica
    Eskridge, Kent
    MACHINE LEARNING WITH APPLICATIONS, 2024, 17
  • [34] Enhancing Fake News Detection with Word Embedding: A Machine Learning and Deep Learning Approach
    Al-Tarawneh, Mutaz A. B.
    Al-irr, Omar
    Al-Maaitah, Khaled S.
    Kanj, Hassan
    Aly, Wael Hosny Fouad
    COMPUTERS, 2024, 13 (09)
  • [35] Chinese Textual Entailment Recognition Enhanced with Word Embedding
    Zhang, Zhichang
    Yao, Dongren
    Pang, Yali
    Lu, Xiaoyong
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 89 - 100
  • [36] Topic Modeling, Sentiment Analysis and Text Summarization for Analyzing News Headlines and Articles
    Thakur, Omswroop
    Saritha, Sri Khetwat
    Jain, Sweta
    MACHINE LEARNING, IMAGE PROCESSING, NETWORK SECURITY AND DATA SCIENCES, MIND 2022, PT I, 2022, 1762 : 220 - 239
  • [37] Identification of topic evolution: network analytics with piecewise linear representation and word embedding
    Huang, Lu
    Chen, Xiang
    Zhang, Yi
    Wang, Changtian
    Cao, Xiaoli
    Liu, Jiarun
    SCIENTOMETRICS, 2022, 127 (09) : 5353 - 5383
  • [38] Topic Enhanced Word Embedding for Toxic Content Detection in Q&A Sites
    Kim, Do Yeon
    Li, Xiaohang
    Wang, Sheng
    Zhuo, Yunying
    Lee, Roy Ka-Wei
    PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2019), 2019, : 1064 - 1071
  • [39] Extractive Myanmar News Summarization Using Centroid Based Word Embedding
    Lwin, Soe Soe
    Nwet, Khin Thandar
    2019 INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION TECHNOLOGIES (ICAIT), 2019, : 200 - 205
  • [40] News Topic-typed Microblog Opinion Sentence Recognition
    Fang, Yi Cheng
    Du, Ya Jun
    Tang, Ming Wei
    2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 2385 - 2390