Automatic Categorized Corpus Creation of Hindi Poetries Based on Rasa(s) for Linguistics Research

被引:0
作者
Pal, Kaushika [1 ]
Patel, Biraj, V [2 ]
机构
[1] Sarvajanik Coll Engn & Technol, Surat, Gujarat, India
[2] Sardar Patel Univ, Dept Comp Sci & Technol, Vv Nagar, Gujarat, India
来源
SMART SYSTEMS: INNOVATIONS IN COMPUTING (SSIC 2021) | 2022年 / 235卷
关键词
D O I
10.1007/978-981-16-2877-1_50
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Corpus creation is required for solving natural language processing problems and linguistic research using machine learning and deep learning. Classification problems solved using supervised learning methods need labeled corpus. It is tedious task if done manually. This research article is creating labeled corpus of Hindi poetries based on Rasa(s). Preprocessing of the extracted documents is done using NLP techniques to create a clean corpus ready to use for solving classification problem or any other problem. The garbage elements are removed from the extracted files, and file containing excess garbage is deleted. The corpus created comprises of poetries of 9 Rasa(s) also called Navrasa. A total of 1089 files were extracted comprising of 333 Adbhuta, 5 Bhayank, 25 Hasya, 70 Karuna, 6 Raudra, 277 Shanta, 218 Shringar, 147 Veera, and 8 Vibhasta Rasa poetries in 04:20:23 hrs with negligible human intervention. A total of 17 garbage files were deleted, and the final corpus have 1072 poetry documents comprising of 328 Adbhuta, 5 Bhayank, 25 Hasya, 68 Karuna, 6 Raudra, 272 Shanta, 216 Shringar, 144 Veera, and 8 Vibhasta Rasa poetries.
引用
收藏
页码:549 / 556
页数:8
相关论文
共 12 条
  • [1] Abdugafurovna R. A., 2020, J CRIT REV, V7, P120, DOI 10.31838/jcr.07.12.20
  • [2] Aref A., 2020, 9th International Conference on Information Technology Convergence and Services, P81
  • [3] Chakravarthi B.R, P 1 JOINT WORKSHOP S, P202
  • [4] Gohil L., 2019, INT J INNOV TECHNOL, V8, P2290, DOI [10.35940/ijitee.I8443.078919, DOI 10.35940/IJITEE.I8443.078919]
  • [5] A Semantics Aware Random Forest for Text Classification
    Islam, Md Zahidul
    Liu, Jixue
    Li, Jiuyong
    Liu, Lin
    Kang, Wei
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 1061 - 1070
  • [6] Jamal Noraini, 2012, Journal of Computer Science, V8, P1441
  • [7] Jha V., 2016, International Journal of Scientific and Engineering Research, V7, P968, DOI DOI 10.14299/IJSER.2016.09.005
  • [8] Kaur J., 2017, INFOCOMP J COMPUT SC, V16, P1
  • [9] Kumar Mandal A., 2014, International Journal of Artificial Intelligence Applications, V5, DOI DOI 10.5121/IJAIA.2014.5508
  • [10] Patil R.P., INT J INNOVAT TECHNO, V9, P2446