Detection of Stopwords in Classical Chinese Poetry

被引:0
|
作者
Peng, Lei [1 ]
Ma, Xiaodong [2 ,3 ]
Teng, Zheng [4 ]
机构
[1] Chongqing Three Gorges Med Coll, Lib & Informat Sci Ctr, Chongqing, Peoples R China
[2] INTI Int Univ, Fac Data Sci & Informat Technol, Nilai, N Sembilan, Malaysia
[3] HuangHe Sci & Technol Univ, Sch Int, Zhengzhou, Henan, Peoples R China
[4] Chongqing Three Gorges Med Coll, Sch Med Technol, Chongqing, Peoples R China
关键词
TF-IDF; stopwords; Chinese; poetry; frequency;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this research, we address the problem of stopword detection in Classical Chinese Poetry, an area that has not been explored previously. Stopword detection is crucial in text mining tasks, as identifying and removing stopwords is essential for improving the performance of various natural language processing models. Inspired by the TF-IDF method, we propose a novel approach that utilizes external knowledge to reconstruct the Term Weight matrix. Our key finding is that incorporating external knowledge significantly refines the granularity of the term weight, thereby improving the effectiveness of stopword detection. Based on these findings, we conclude that external knowledge can enhance the ability of text representation, especially for the short texts in Classical Chinese Poetry.
引用
收藏
页码:255 / 261
页数:7
相关论文
共 50 条
  • [31] Sacred Mountains, Abandoned Women, and Upright Officials: Facets of the Incense Burner in Early Medieval Chinese Poetry
    Kirkova, Zornica
    EARLY MEDIEVAL CHINA, 2018, (24) : 53 - 81
  • [32] Chinese Sources for AfterWards: From Premodern Poetry, Paintings, and Medical Texts to Modern Novels, Film, and Documentaries
    Hanson, Marta
    CHINESE MEDICINE AND CULTURE, 2023, 6 (02) : 127 - 138
  • [33] Advancing Chinese Event Detection via Revisiting Character Information
    Qin, Yanxia
    Wang, Zhongqing
    Zhang, Yue
    Chen, Kehai
    Zhang, Min
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (04)
  • [34] Multimedia technologies for presenting poetry in online educational blogs: interpreting the poems of Chinese poets in contemporary music of China
    Zhang, Hongmei
    EDUCATION AND INFORMATION TECHNOLOGIES, 2024, 29 (10) : 12001 - 12021
  • [35] A Novel Chinese Sarcasm Detection Model Based on Retrospective Reader
    Zhang, Lei
    Zhao, Xiaoming
    Song, Xueqiang
    Fang, Yuwei
    Li, Dong
    Wang, Haizhou
    MULTIMEDIA MODELING, MMM 2022, PT II, 2022, 13142 : 267 - 278
  • [36] Health-Related Spammer Detection on Chinese Social Media
    Chen, Xinhuan
    Zhang, Yong
    Xu, Jennifer
    Xing, Chunxiao
    Chen, Hsinchun
    SMART HEALTH, ICSH 2015, 2016, 9545 : 284 - 295
  • [37] Molecular characterization of seven Chinese isolates of infectious bursal disease virus: Classical, very virulent, and variant strains
    Cao, YC
    Yeung, WS
    Law, M
    Bi, YZ
    Leung, FC
    Lim, BL
    AVIAN DISEASES, 1998, 42 (02) : 340 - 351
  • [38] MODULATIONS IN PORTUGUESE OF ORIENTAL IMAGES, EXPERIENCES AND AESTHETICS: THE FASCINATION WITH SOME CHINESE AND JAPANESE POETIC DICTION AS POETRY'S UTOPIA
    Borges, Vera
    ALEA-ESTUDOS NEOLATINOS, 2022, 24 (02) : 232 - 259
  • [39] AUTOMATIC CHINESE PRONUNCIATION ERROR DETECTION USING SVM TRAINED WITH STRUCTURAL FEATURES
    Zhao, Tongmu
    Hoshino, Akemi
    Suzuki, Masayuki
    Minematsu, Nobuaki
    Hirose, Keikichi
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 473 - 478
  • [40] Detection of carcinomas in an asymptomatic Chinese population: Advantage of screening with multiple tumor markers
    Tsao, KC
    Wu, TL
    Chang, PY
    Hong, JH
    Wu, JT
    JOURNAL OF CLINICAL LABORATORY ANALYSIS, 2006, 20 (02) : 42 - 46