Empirical Text Analysis for Identifying the Genres of Bengali Literary Work

被引:1
作者
Afroze, Ayesha [1 ]
Dutta, Kishowloy [1 ]
Sadik, Sadman [1 ]
Khanam, Sadia [1 ]
Rab, Raqeebir [1 ]
Rahim, Mohammad Asifur [1 ]
机构
[1] Ahsanullah Univ Sci & Technol AUST, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
genre; Long Short-Term Memory (LSTM); Convolutional Neural Networks (CNN); Bidirectional Encoder Representations from Transformers (BERT); Support Vector Machines (SVM); Natural Language Processing; Book Snippets; Recurrent Neural Networks (RNN);
D O I
10.12720/jait.15.5.602-613
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Digital books and internet retailers are growing in popularity daily. Different individuals prefer various genres of literature. Categorizing genres facilitates the discovery of books that match a reader's tastes. The assortment is the process of categorizing or genre-classifying a book. In this paper, we categorize books by genre using a variety of traditional machine learning and deep learning models based on book titles and snippets. Such work exists for books in other languages but has not yet been completed for Bengali novels. We have developed two types of datasets as a result of data collection for this research. One dataset includes the titles of Bengali novels across nine genres, while the other includes book snippets from three genres. For classification, we have employed logistic regression, Support Vector Machines (SVM), random forest classifiers, decision trees, Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and Bidirectional Encoder Representations from Transformers (BERT). Among all the models, BERT has the highest performance for both datasets, with 90% accuracy for the book excerpt dataset and 77% accuracy for the book Title dataset. With the exception of BERT, traditional machine learning models performed better in the Snippets dataset, whereas deep learning models performed better in the Titles dataset. Due to the quantity and the number of words present in the dataset, the performance varied.
引用
收藏
页码:602 / 613
页数:12
相关论文
共 10 条
  • [1] Multi-Supervised LSTM for Bengali Text Sentiment Analysis
    Ali, Syed Muaz
    Turja, Afif Ibna Kadir Khan
    Tahseen, Sabiha
    Mehedi, Md Humaion Kabir
    Hossain, Md. Sabbir
    Rasel, Annajiat Alim
    PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2023, 2023, : 157 - 162
  • [2] Towards the analysis of the transitivity of Latin text genres: the case of the Latin notary
    Korkiakangas, Timo
    STUDI E SAGGI LINGUISTICI, 2018, 56 (01): : 9 - 41
  • [3] Computational Concordance Analysis of Fictional Literary Work
    Dunder, I
    Pavlovski, M.
    2018 41ST INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2018, : 644 - 648
  • [4] A comparative empirical study on social media sentiment analysis over various genres and languages
    Viktor Hangya
    Richárd Farkas
    Artificial Intelligence Review, 2017, 47 : 485 - 505
  • [5] A comparative empirical study on social media sentiment analysis over various genres and languages
    Hangya, Viktor
    Farkas, Richard
    ARTIFICIAL INTELLIGENCE REVIEW, 2017, 47 (04) : 485 - 505
  • [6] Author identification of literary works based on text analysis and deep learning
    Tang, Xu
    HELIYON, 2024, 10 (03)
  • [7] An Empirical Analysis of Moroccan Dialectal User-Generated Text
    Tachicart, Ridouane
    Bouzoubaa, Karim
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT II, 2019, 11684 : 3 - 12
  • [8] Relevance Theory and socio-rhetorical analysis of text genres: Analysis of answers in direct and indirect query-letters
    Rauen, Fabio
    REVISTA SIGNOS, 2010, 43 : 205 - 225
  • [9] An empirical comparison of machine learning methods for text-based sentiment analysis of online consumer reviews
    Alantari, Huwail J.
    Currim, Imran S.
    Deng, Yiting
    Singh, Sameer
    INTERNATIONAL JOURNAL OF RESEARCH IN MARKETING, 2022, 39 (01) : 1 - 19
  • [10] Moving Beyond ChatGPT: Local Large Language Models (LLMs) and the Secure Analysis of Confidential Unstructured Text Data in Social Work Research
    Perron, Brian E.
    Luan, Hui
    Victor, Bryan G.
    Hiltz-Perron, Oliver
    Ryan, Joseph
    RESEARCH ON SOCIAL WORK PRACTICE, 2024,