Word embedding empowered topic recognition in news articles

被引:0
|
作者
Kaleem, Sidrah [1 ]
Jalil, Zakia [2 ]
Nasir, Muhammad [3 ]
Alazab, Moutaz [4 ,5 ]
机构
[1] Int Islamic Univ, Dept Comp Sci, Islamabad, Pakistan
[2] Int Islamic Univ, Dept Data Sci & Artificial Intelligence, Islamabad, Pakistan
[3] Int Islamic Univ, Dept Software Engn, Islamabad, Pakistan
[4] Al Balqa Appl Univ, Fac Artificial Intelligence, Dept Intelligent Syst, Al Salt, Jordan
[5] Liverpool John Moores Univ, Oryx Universal Coll, Sch Comp & Data Sci, Doha, Qatar
关键词
Artificial intelligence; Computer vision; Neural networks; Natural language processing; Word embedding; Topic modeling; MODEL;
D O I
10.7717/peerj-cs.2300
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Advancements in technology have placed global news at our fingertips, anytime, anywhere, through social media and online news sources. Analyzing the extensive electronic text collections is urgently needed. According to the scholars, combining the topic and word embedding models could improve text representation and help with downstream tasks related to natural language processing. However, the field of news topic recognition lacks a standardized approach to integrating topic models and word embedding models. This presents an exciting opportunity for research, as existing algorithms tend to be overly complex and miss out on the potential benefits of fusion. To overcome limitations in news text topic recognition, this research suggests a new technique word embedding latent Dirichlet allocation that combines topic models and word embeddings for better news topic recognition. This framework seamlessly integrates probabilistic topic modeling using latent Dirichlet allocation with Gibbs sampling, semantic insights from Word2Vec embeddings, and syntactic relationships to extract comprehensive text representations. Popular classifiers leverage these representations to perform automatic and precise news topic identification. Consequently, our framework seamlessly integrates document-topic relationships and contextual information, enabling superior performance, enhanced expressiveness, and efficient dimensionality reduction. Our word embedding method significantly outperforms existing approaches, reaching 88% and 97% accuracy on 20NewsGroup and BBC News in news topic recognition.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Automatic Topic Title Assignment with Word Embedding
    Zammarchi, Gianpaolo
    Romano, Maurizio
    Conversano, Claudio
    JOURNAL OF CLASSIFICATION, 2024, 41 (03) : 650 - 677
  • [2] A Word Embedding Model For Topic Recommendation
    Kannan, Megala S.
    Mahalakshmi, G. S.
    Smitha, E. S.
    Sendhilkumar, S.
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2018, : 1307 - 1311
  • [3] A Word Embedding Model For Topic Recommendation
    Kannan, Megala S.
    Mahalakshmi, G. S.
    Smitha, E. S.
    Sendhilkumar, S.
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2018, : 826 - 830
  • [4] Topic Classification Based on Improved Word Embedding
    Sheng, Liangliang
    Xu, Lizhen
    2017 14TH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE (WISA 2017), 2017, : 117 - 121
  • [5] Topic-Aware Sentiment Analysis of News Articles
    Akhmetov, Iskander
    Gelbukh, Alexander
    Mussabayev, Rustam
    COMPUTACION Y SISTEMAS, 2022, 26 (01): : 423 - 439
  • [6] Comparative study of word embedding methods in topic segmentation
    Naili, Marwa
    Chaibi, Anja Habacha
    Ben Ghezala, Henda Hajjami
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 340 - 349
  • [7] Word Embedding-Based Topic Similarity Measures
    Terragni, Silvia
    Fersini, Elisabetta
    Messina, Enza
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021), 2021, 12801 : 33 - 45
  • [8] Word Embedding for Rhetorical Sentence Categorization on Scientific Articles
    Rachman, Ghoziyah Haitan
    Khodra, Masayu Leylia
    Widyantoro, Dwi Hendratmo
    JOURNAL OF ICT RESEARCH AND APPLICATIONS, 2018, 12 (02) : 168 - 184
  • [9] Investigating Cybersecurity News Articles by Applying Topic Modeling Method
    Ghasiya, Piyush
    Okamura, Koji
    35TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2021), 2021, : 432 - 438
  • [10] Experimental Study of Morphological Analyzers for Topic Categorization in News Articles
    Ahn, Sangtae
    APPLIED SCIENCES-BASEL, 2023, 13 (19):