Detection of Stopwords in Classical Chinese Poetry

被引:0
|
作者
Peng, Lei [1 ]
Ma, Xiaodong [2 ,3 ]
Teng, Zheng [4 ]
机构
[1] Chongqing Three Gorges Med Coll, Lib & Informat Sci Ctr, Chongqing, Peoples R China
[2] INTI Int Univ, Fac Data Sci & Informat Technol, Nilai, N Sembilan, Malaysia
[3] HuangHe Sci & Technol Univ, Sch Int, Zhengzhou, Henan, Peoples R China
[4] Chongqing Three Gorges Med Coll, Sch Med Technol, Chongqing, Peoples R China
关键词
TF-IDF; stopwords; Chinese; poetry; frequency;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this research, we address the problem of stopword detection in Classical Chinese Poetry, an area that has not been explored previously. Stopword detection is crucial in text mining tasks, as identifying and removing stopwords is essential for improving the performance of various natural language processing models. Inspired by the TF-IDF method, we propose a novel approach that utilizes external knowledge to reconstruct the Term Weight matrix. Our key finding is that incorporating external knowledge significantly refines the granularity of the term weight, thereby improving the effectiveness of stopword detection. Based on these findings, we conclude that external knowledge can enhance the ability of text representation, especially for the short texts in Classical Chinese Poetry.
引用
收藏
页码:255 / 261
页数:7
相关论文
共 50 条