Word embedding empowered topic recognition in news articles

被引:0
|
作者
Kaleem, Sidrah [1 ]
Jalil, Zakia [2 ]
Nasir, Muhammad [3 ]
Alazab, Moutaz [4 ,5 ]
机构
[1] Int Islamic Univ, Dept Comp Sci, Islamabad, Pakistan
[2] Int Islamic Univ, Dept Data Sci & Artificial Intelligence, Islamabad, Pakistan
[3] Int Islamic Univ, Dept Software Engn, Islamabad, Pakistan
[4] Al Balqa Appl Univ, Fac Artificial Intelligence, Dept Intelligent Syst, Al Salt, Jordan
[5] Liverpool John Moores Univ, Oryx Universal Coll, Sch Comp & Data Sci, Doha, Qatar
关键词
Artificial intelligence; Computer vision; Neural networks; Natural language processing; Word embedding; Topic modeling; MODEL;
D O I
10.7717/peerj-cs.2300
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Advancements in technology have placed global news at our fingertips, anytime, anywhere, through social media and online news sources. Analyzing the extensive electronic text collections is urgently needed. According to the scholars, combining the topic and word embedding models could improve text representation and help with downstream tasks related to natural language processing. However, the field of news topic recognition lacks a standardized approach to integrating topic models and word embedding models. This presents an exciting opportunity for research, as existing algorithms tend to be overly complex and miss out on the potential benefits of fusion. To overcome limitations in news text topic recognition, this research suggests a new technique word embedding latent Dirichlet allocation that combines topic models and word embeddings for better news topic recognition. This framework seamlessly integrates probabilistic topic modeling using latent Dirichlet allocation with Gibbs sampling, semantic insights from Word2Vec embeddings, and syntactic relationships to extract comprehensive text representations. Popular classifiers leverage these representations to perform automatic and precise news topic identification. Consequently, our framework seamlessly integrates document-topic relationships and contextual information, enabling superior performance, enhanced expressiveness, and efficient dimensionality reduction. Our word embedding method significantly outperforms existing approaches, reaching 88% and 97% accuracy on 20NewsGroup and BBC News in news topic recognition.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] Transformer based image caption generation for news articles
    Pande, Ashtavinayak
    Pandey, Atul
    Solanki, Ayush
    Shanbhag, Chinmay
    Motghare, Manish
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01):
  • [42] Combine Topic Modeling with Semantic Embedding: Embedding Enhanced Topic Model
    Zhang, Peng
    Wang, Suge
    Li, Deyu
    Li, Xiaoli
    Xu, Zhikang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (12) : 2322 - 2335
  • [43] GLTM: A Global and Local Word Embedding-Based Topic Model for Short Texts
    Liang, Wenxin
    Feng, Ran
    Liu, Xinyue
    Li, Yuangang
    Zhang, Xianchao
    IEEE ACCESS, 2018, 6 : 43612 - 43621
  • [44] WET: Word embedding-topic distribution vectors for MOOC video lectures dataset
    Kastrati, Zenun
    Kurti, Arianit
    Imran, Ali Shariq
    DATA IN BRIEF, 2020, 28
  • [45] Exploratory Investigation of Word Embedding in Song Lyric Topic Classification: Promising Preliminary Results
    Choi, Kahyun
    Downie, J. Stephen
    JCDL'18: PROCEEDINGS OF THE 18TH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, 2018, : 327 - 328
  • [46] Keyphrase Extraction Using Enhanced Word and Document Embedding
    Alotaibi, Fahd Saleh
    Sharma, Saurabh
    Gupta, Vishal
    Gupta, Savita
    IETE JOURNAL OF RESEARCH, 2023, 69 (12) : 8876 - 8888
  • [47] Exploring funding patterns with word embedding-enhanced organization-topic networks: a case study on big data
    Jin, Qianqian
    Chen, Hongshu
    Wang, Ximeng
    Ma, Tingting
    Xiong, Fei
    SCIENTOMETRICS, 2022, 127 (09) : 5415 - 5440
  • [48] Named-Entity Recognition for Disaster Related Filipino News Articles
    Dela Cruz, Bern Maris
    Montalla, Cyril
    Manansala, Allysa
    Rodriguez, Ramon
    Octaviano, Manolito, Jr.
    Fabito, Bernie S.
    PROCEEDINGS OF TENCON 2018 - 2018 IEEE REGION 10 CONFERENCE, 2018, : 1633 - 1636
  • [49] Keyword Network Analysis and Topic Modeling of News Articles Related to Artificial Intelligence and Nursing br
    Ha, Ju-Young
    Park, Hyo-Jin
    JOURNAL OF KOREAN ACADEMY OF NURSING, 2023, 53 (01) : 55 - 68
  • [50] Comparative Study of Word Embedding Methods in Biomedical Named Entities Recognition
    Derbel, Houssemeddine
    Habacha Chaibi, Anja
    Benabdelkader, Chiraz
    Hajjami Ben Ghezala, Henda
    VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE, 2019, : 6356 - 6367