Word embedding empowered topic recognition in news articles

被引:0
|
作者
Kaleem, Sidrah [1 ]
Jalil, Zakia [2 ]
Nasir, Muhammad [3 ]
Alazab, Moutaz [4 ,5 ]
机构
[1] Int Islamic Univ, Dept Comp Sci, Islamabad, Pakistan
[2] Int Islamic Univ, Dept Data Sci & Artificial Intelligence, Islamabad, Pakistan
[3] Int Islamic Univ, Dept Software Engn, Islamabad, Pakistan
[4] Al Balqa Appl Univ, Fac Artificial Intelligence, Dept Intelligent Syst, Al Salt, Jordan
[5] Liverpool John Moores Univ, Oryx Universal Coll, Sch Comp & Data Sci, Doha, Qatar
关键词
Artificial intelligence; Computer vision; Neural networks; Natural language processing; Word embedding; Topic modeling; MODEL;
D O I
10.7717/peerj-cs.2300
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Advancements in technology have placed global news at our fingertips, anytime, anywhere, through social media and online news sources. Analyzing the extensive electronic text collections is urgently needed. According to the scholars, combining the topic and word embedding models could improve text representation and help with downstream tasks related to natural language processing. However, the field of news topic recognition lacks a standardized approach to integrating topic models and word embedding models. This presents an exciting opportunity for research, as existing algorithms tend to be overly complex and miss out on the potential benefits of fusion. To overcome limitations in news text topic recognition, this research suggests a new technique word embedding latent Dirichlet allocation that combines topic models and word embeddings for better news topic recognition. This framework seamlessly integrates probabilistic topic modeling using latent Dirichlet allocation with Gibbs sampling, semantic insights from Word2Vec embeddings, and syntactic relationships to extract comprehensive text representations. Popular classifiers leverage these representations to perform automatic and precise news topic identification. Consequently, our framework seamlessly integrates document-topic relationships and contextual information, enabling superior performance, enhanced expressiveness, and efficient dimensionality reduction. Our word embedding method significantly outperforms existing approaches, reaching 88% and 97% accuracy on 20NewsGroup and BBC News in news topic recognition.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Statistical Topic Modeling for Urdu Text Articles
    Rehman, Anwar Ur
    Rehman, Zobia
    Akram, Junaid
    Ali, Waqar
    Shah, Munam Ali
    Salman, Muhammad
    2018 24TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC' 18), 2018, : 62 - 67
  • [22] Topic Modelling of News Articles for Two Consecutive Elections in South Africa
    Moodley, Avashlin
    Marivate, Vukosi
    2019 6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2019), 2019, : 131 - 136
  • [23] Probabilistic topic modeling for short text based on word embedding networks
    Pita, Marcelo
    Nunes, Matheus
    Pappa, Gisele L.
    APPLIED INTELLIGENCE, 2022, 52 (15) : 17829 - 17844
  • [24] Comparative analysis with topic modeling and word embedding methods after the Aegean Sea earthquake on Twitter
    Eliguzel, Nazmiye
    Cetinkaya, Cihan
    Dereli, Turkay
    EVOLVING SYSTEMS, 2023, 14 (02) : 245 - 261
  • [25] Exploiting word embedding for heterogeneous topic model towards patent recommendation
    Chen, Jie
    Chen, Jialin
    Zhao, Shu
    Zhang, Yanping
    Tang, Jie
    SCIENTOMETRICS, 2020, 125 (03) : 2091 - 2108
  • [26] Probabilistic topic modeling for short text based on word embedding networks
    Marcelo Pita
    Matheus Nunes
    Gisele L. Pappa
    Applied Intelligence, 2022, 52 : 17829 - 17844
  • [27] Exploiting word embedding for heterogeneous topic model towards patent recommendation
    Jie Chen
    Jialin Chen
    Shu Zhao
    Yanping Zhang
    Jie Tang
    Scientometrics, 2020, 125 : 2091 - 2108
  • [28] Identifying and Lnderstanding Business Trends using Topic Models with Word Embedding
    Pek, Yun Ning
    Lim, Kwan Hui
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6177 - 6179
  • [29] Topic Modeling for Short Texts via Word Embedding and Document Correlation
    Yi, Feng
    Jiang, Bo
    Wu, Jianjun
    IEEE ACCESS, 2020, 8 : 30692 - 30705
  • [30] Measuring Similarity of Academic Articles with Semantic Profile and Joint Word Embedding
    Liu, Ming
    Lang, Bo
    Gu, Zepeng
    Zeeshan, Ahmed
    TSINGHUA SCIENCE AND TECHNOLOGY, 2017, 22 (06) : 619 - 632