Author and genre identification of Turkish news texts using deep learning algorithms

被引:1
作者
Tufekci, Pinar [1 ]
Bektas, Melike [2 ]
机构
[1] Tekirdag Namik Kemal Univ, Corlu Fac Engn, Dept Comp Engn, Tekirdag, Turkey
[2] Bursa Tech Univ, Dept Informat Technol, Bursa, Turkey
来源
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES | 2022年 / 47卷 / 04期
关键词
Author identification; genre identification; deep learning; text classification; Turkish news datasets; machine learning; CATEGORIZATION;
D O I
10.1007/s12046-022-01975-3
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Nowadays, the increasing amount of data has brought the need to classify the data. Text classification is the process of categorizing similar text data. This paper aims to make a modeling study for author and genre identification, which is one of the important challenges of text classification, for Turkish news texts by using machine and deep learning algorithms. For this purpose, firstly, a total of 13 large-scale datasets having multi classes are built as new datasets. In the modeling stage, Multinomial Naive Bayes (MNB), Random Forest (RF), Convolutional Neural Network (CNN), and Long Short Term Memory (LSTM) algorithms were applied to the datasets. Results showed that for dataset AI-TNKU-7, the CNN algorithm demonstrated the highest accuracy for author identification at 95.81%. In relation to genre identification, the LSTM algorithm for the dataset GI-TNKU-6 demonstrated the highest accuracy at 96.73%.
引用
收藏
页数:10
相关论文
共 33 条
  • [1] Age and Gender prediction in Open Domain Text
    Abdallah, Emad E.
    Alzghoul, Jamil R.
    Alzghool, Muath
    [J]. 11TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT) / THE 3RD INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40) / AFFILIATED WORKSHOPS, 2020, 170 : 563 - 570
  • [2] Akin A. A., 2007, STRUCTURE, V10, P1
  • [3] Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms
    Al-Salemi, Bassam
    Ayob, Masri
    Kendall, Graham
    Noah, Shahrul Azman Mohd
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (01) : 212 - 227
  • [4] Author gender identification from Arabic text
    Alsmearat, Kholoud
    Al-Ayyoub, Mahmoud
    Al-Shalabi, Riyad
    Kanaan, Ghassan
    [J]. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2017, 35 : 85 - 95
  • [5] Amasyali MF, 2006, LECT NOTES COMPUT SC, V3999, P221
  • [6] Bhagvati Ritesh Chakravarthy, 2018, Procedia Computer Science, V132, P614, DOI 10.1016/j.procs.2018.05.015
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] Author gender identification from text
    Cheng, Na
    Chandramouli, R.
    Subbalakshmi, K. P.
    [J]. DIGITAL INVESTIGATION, 2011, 8 (01) : 78 - 88
  • [9] Erkan T, 2019, 2019 1 INT INFORMATI, P1
  • [10] Comparison of Long Short Term Memory Networks and the Hydrological Model in Runoff Simulation
    Fan, Hongxiang
    Jiang, Mingliang
    Xu, Ligang
    Zhu, Hua
    Cheng, Junxiang
    Jiang, Jiahu
    [J]. WATER, 2020, 12 (01)