Data Analysis of the Web News Headlines based on Natural Language Processing

被引:0
|
作者
Karna, Hrvoje [1 ,2 ]
Braovic, Maja [3 ]
Vickovic, Linda [3 ]
Krstinic, Damir [3 ]
机构
[1] Minist Def Republ Croatia, Zagreb, Croatia
[2] Univ Split, Split, Croatia
[3] Univ Split, Fac Elect Engn Mech Engn & Naval Architecture, Dept Elect & Comp, Split, Croatia
关键词
data mining; information extraction; natural language processing; news portals; text analysis;
D O I
10.24138/jcomss-2023-0047
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
paper explores the problem of media content data analysis with the focus on the phenomenon of vaccination, closely related to the COVID-19 pandemic. The presented research is an extension of the previous work, but it differs in two main areas. Firstly, the text corpus submitted to the analysis has been considerably increased. Secondly, the previous data analysis was performed on the body part of the posts, while now it is focused on the most prominent part of the news posts, their headlines. This change from body to headline analysis was provoked by significant differences in their characteristics and the fact that most people read only headlines. Described data acquisition uses an advanced content collection approach followed by the modeling process, during which a set of natural language processing algorithms were applied. To enable the comparison, the model uses the same set of algorithms in the modeling phase like in previous work. The main contributions of the work are manifested in: i) approaching the problem from a new perspective, ii) applying more efficient method of data collection, and crucially iii) enabling the comparison of analysis results for individual parts of the content, which ensured a comprehensive insight into the characteristics of news posts.
引用
收藏
页码:158 / 167
页数:10
相关论文
共 50 条
  • [21] Study on Chinglish in Web Text for Natural Language Processing
    Chen, Bo
    Chen, Lyu
    Ji, Ziqing
    CHINESE LEXICAL SEMANTICS, CLSW 2017, 2018, 10709 : 533 - 539
  • [22] Real and Fake News Classification Using Natural Language Processing
    Kumar, Shivam
    Krishnan, C. Santhana
    Ramya, M.
    JOURNAL OF PHARMACEUTICAL NEGATIVE RESULTS, 2022, 13 : 1535 - 1540
  • [23] A comprehensive study of Natural Language processing techniques Based on Big Data
    Banane, Mouad
    Erraissi, Allae
    2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 1492 - 1497
  • [24] Exploratory Analysis of Chat-based Black Market Profiles with Natural Language Processing
    Buesgen, Andre
    Kloeser, Lars
    Kohl, Philipp
    Schmidts, Oliver
    Kraft, Bodo
    Zuendorf, Albert
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, TECHNOLOGY AND APPLICATIONS (DATA), 2022, : 83 - 94
  • [25] Web Document Text and Images Extraction using DOM Analysis and Natural Language Processing
    Joshi, Parag Mulendra
    Liu, Sam
    DOCENG'09: PROCEEDINGS OF THE 2009 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2009, : 218 - 221
  • [26] Thai Fake News Detection Based on Information Retrieval, Natural Language Processing and Machine Learning
    Meesad P.
    SN Computer Science, 2021, 2 (6)
  • [27] Data augmentation techniques in natural language processing
    Pellicer, Lucas Francisco Amaral Orosco
    Ferreira, Taynan Maier
    Costa, Anna Helena Reali
    APPLIED SOFT COMPUTING, 2023, 132
  • [28] Bayesian Analysis in Natural Language Processing
    Cohen S.
    Synthesis Lectures on Human Language Technologies, 2016, 9 (02): : 1 - 276
  • [29] Grey Relational Analysis and Natural Language Processing to: Grey Language Processing
    Khuman, Arjab Singh
    Yang, Yingjie
    Liu, Sifeng
    JOURNAL OF GREY SYSTEM, 2016, 28 (01) : 88 - 97
  • [30] Twitter based Data Analysis in Natural Language Processing using a Novel Catboost Recurrent Neural Framework
    Narasamma, V. Laxmi
    Sreedevi, M.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (05) : 440 - 447