Effect of Stemming on Hindi Text Classification

被引:0
作者
Pimpalshende, Anjusha [1 ]
Singh, Preety [1 ]
Potnurwar, Archana [2 ]
机构
[1] VNR Vignana Jyothi Inst Engn Technol, Dept CSE, Hyderabad, Pakistan
[2] Priyadarshini Coll Engn, Dept IT, Nagpur, India
来源
INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING | 2023年 / 14卷 / 01期
关键词
Text classification; Hindi text; Stemmer; suffix; prefix; Information retrieval; syntactic parsing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Text classification is very useful to search large amount of textual data available online by dividing it into smaller relevant units. Now a day's large amount of digital documents are available in Indian languages. Designing text classifiers in Indian languages is one of the research areas so that people can search and read required documents in their local languages. In proposed work tried to design Text classifier for Hindi text documents and tried to show how stemmer affects the performance of Hindi text classifiers. Stemming is a process to convert words in any language to its base or root words. Stemmers are used for written documents not for spoken languages. Performance of many applications such as text summarization, Information Retrieval (IR) system,text classification systems, syntactic parsing can be improved by applying stemmers. Stemmer eliminates suffix or prefix of the word and form original root word. These root words helps in the preprocessing step required in many algorithms. We applied various stemmers on Hindi text classification models. Experiments and results show that performance of the classifiers is improved by applying stemmers.
引用
收藏
页码:208 / 215
页数:8
相关论文
共 8 条
  • [1] ANJUSHA PIMPALSHENDE A. M., 2016, INT J COMPUT SCI INF, V14
  • [2] ANJUSHA PIMPALSHENDE A. M., 2019, ADV INTELL SYST, V13
  • [3] BADLANI KARAN, 2022, INT J NEXT-GENER COM, V13
  • [4] Gupta V., 2014, Int J Adv Res Comput Sci Softw Eng, V4, P62
  • [5] PLIS: Proposed Language Independent Stemmer for Information Retrieval Systems Using Dynamic Programming
    Kasthuri, M.
    Kumar, S. Britto Ramesh
    Khaddaj, Souheil
    [J]. 2017 2ND WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT), 2017, : 132 - 135
  • [6] PATEL, 2010, P 1 WORKSH S SE AS N
  • [7] PAUL S., 2013, DESIGN RULE BASED HI
  • [8] SWAPNA NARALA B., 2016, INT J ENVIRON SCI TE, V7